Text Diff Checker
Compare two text inputs side by side and see exactly what changed. Lines that were added appear in green, removed lines in red, and unchanged lines are dimmed. Get a quick summary with similarity percentage.
Enter text in both fields to compare
Text Diff: How Line-by-Line Comparison Works and Why It Matters
Comparing two pieces of text to identify their differences is one of the most fundamental operations in computing. Whether you are reviewing changes to a document, debugging code, or verifying that a translation matches the original, a text diff tool helps you see exactly what was added, removed, or left unchanged. This line-by-line comparison approach is the same principle that powers version control systems like Git, code review platforms, and document collaboration tools.
What Is a Text Diff?
A text diff (short for difference) is the output of comparing two text inputs and highlighting where they diverge. The comparison is typically performed line by line. Each line in the output is classified as one of three types: added (present in the second text but not the first), removed (present in the first text but not the second), or unchanged (identical in both texts).
The diff concept was formalized in the early days of Unix. The original diff utility, written by Douglas McIlroy in 1974 at Bell Labs, used the Longest Common Subsequence (LCS) algorithm to find the minimal set of changes needed to transform one file into another. This algorithm remains the foundation of most diff tools today, including the one used on this page.
The output of a diff is sometimes called a patch, especially in software development. A patch file contains just the differences between two versions of a file and can be applied to transform the old version into the new version. This approach is far more efficient than storing complete copies of every version, which is why diff-based systems form the backbone of modern version control.
The Longest Common Subsequence Algorithm
The Longest Common Subsequence (LCS) algorithm finds the longest sequence of lines that appear in both texts in the same order, though not necessarily consecutively. By identifying what the two texts have in common, the algorithm can determine which lines are unique to each text.
The algorithm works by building a matrix where each cell represents the length of the LCS up to that point in both texts. The matrix is filled using dynamic programming: if two lines match, the cell value is one plus the diagonal predecessor; otherwise, it takes the maximum of the cell above or to the left. After the matrix is complete, a backtracking step traces through it to produce the actual diff output.
The time complexity of this algorithm is O(m × n) where m and n are the number of lines in each text. For typical document comparisons with a few hundred lines, this runs in milliseconds. For very large texts, more advanced algorithms like the Myers diff algorithm can improve performance, but the LCS approach provides clear, correct results for the vast majority of use cases.
Reading a Diff Output
In a standard unified diff format, each line is prefixed with a symbol indicating its status. A plus sign (+) marks lines that were added in the second text. A minus sign (-) marks lines that were removed from the first text. Lines with no prefix are unchanged and provide context. This color-coded visual format makes it easy to scan large documents and quickly identify the important changes.
When reviewing a diff, it helps to focus on the added and removed lines first, then use the unchanged lines as landmarks to understand where in the document the changes occurred. In many diff tools, unchanged lines are dimmed or collapsed so that the additions and removals stand out more prominently.
The statistics shown alongside the diff output — lines added, removed, unchanged, and the overall similarity percentage — provide a quantitative summary. A high similarity percentage with a few targeted additions or removals suggests a focused edit. A low similarity percentage indicates substantial rewriting or restructuring.
Common Use Cases for Text Diff
Software developers use diff tools constantly. Every pull request on GitHub, GitLab, or Bitbucket shows a diff of the proposed changes. Code reviews revolve around reading diffs, understanding what changed, and evaluating whether the changes are correct and well-structured. The ability to read diffs fluently is a core skill for professional developers.
Writers and editors use diff tools to track revisions in documents. When comparing draft versions of an article, a diff reveals exactly which sentences were rewritten, which paragraphs were moved, and what new content was added. Legal professionals use diff tools to compare contract versions and ensure that changes between revisions are intentional and authorized.
Translators use diff tools to verify that an updated source document has been fully reflected in the translation. By diffing the old and new source texts, the translator can identify which sections need to be re-translated. System administrators use diff tools to compare configuration files and detect unauthorized changes to server settings.
Tips for Effective Diff Comparison
For the most meaningful results, ensure that both texts use consistent formatting before comparing. Differences in whitespace, line endings, or indentation can create visual noise that obscures the actual content changes. Many diff tools offer options to ignore whitespace differences, though this tool compares lines exactly as entered.
When comparing long documents, consider comparing smaller sections independently. This can make the results easier to review and reduces the chance of the algorithm producing an alignment that is technically correct but confusing to read. Breaking the comparison into logical sections — such as chapter by chapter or function by function — often yields clearer results.
Finally, remember that a line-by-line diff treats any change within a line as a complete line replacement. If a single word changes in a paragraph-length line, the entire line will appear as both removed and added. For more granular comparison, consider reformatting the text so that each sentence or logical unit occupies its own line before running the diff.
Frequently Asked Questions
How does the text diff comparison work?
This tool uses a Longest Common Subsequence (LCS) algorithm to compare the two texts line by line. It finds the longest sequence of matching lines between both inputs and marks everything else as either added or removed. The algorithm runs entirely in your browser — no data is sent to any server.
What do the colors in the diff output mean?
Green lines (prefixed with +) were added in Text B but are not present in Text A. Red lines (prefixed with -) are present in Text A but were removed in Text B. Dimmed lines with no prefix are unchanged and appear in both texts in the same position.
How is the similarity percentage calculated?
Similarity is the number of unchanged lines divided by the total number of lines in the diff output, expressed as a percentage. Two identical texts have 100% similarity. Two completely different texts have 0% similarity. The metric measures structural similarity at the line level, not word or character level.
Can I compare code with this tool?
Yes. This tool works with any plain text, including source code, configuration files, JSON, XML, and more. Since it compares line by line, it works especially well with code and structured text where each line is a discrete unit.
Is there a size limit for the text inputs?
There is no hard character limit, but very large texts (thousands of lines) may take longer to process since the algorithm runs in your browser. For typical document and code comparisons, the tool performs instantly.
Related Calculators
AI Token Cost Calculator
Estimate API costs for GPT-4o, Claude, Gemini, and other LLMs based on token usage.
AI Token & Word Count Calculator
Convert between AI tokens, words, and characters with cost estimation.
API Rate Limit Calculator
Plan your API usage by calculating max throughput, operations per day, delay between requests, and burst capacity.