xia_diff_match_patch.diff.DiffMatchPatch

class xia_diff_match_patch.diff.DiffMatchPatch

Bases: object

Class containing the diff, match and patch methods.

Also contains the behaviour settings.

__init__()

Inits a diff_match_patch object with default settings. Redefine these in your program to override the defaults.

Methods

__init__()

Inits a diff_match_patch object with default settings.

diff_bisect(text1, text2, deadline)

Find the 'middle snake' of a diff, split the problem in two

diff_bisectSplit(text1, text2, x, y, deadline)

Given the location of the 'middle snake', split the diff in two parts and recurse.

diff_charsToLines(diffs, lineArray)

Rehydrate the text in a diff from a string of line hashes to real lines of text.

diff_cleanupEfficiency(diffs)

Reduce the number of edits by eliminating operationally trivial equalities.

diff_cleanupMerge(diffs)

Reorder and merge like edit sections.

diff_cleanupSemantic(diffs)

Reduce the number of edits by eliminating semantically trivial equalities.

diff_cleanupSemanticLossless(diffs)

Look for single edits surrounded on both sides by equalities which can be shifted sideways to align the edit to a word boundary.

diff_commonOverlap(text1, text2)

Determine if the suffix of one string is the prefix of another.

diff_commonPrefix(text1, text2)

Determine the common prefix of two strings.

diff_commonSuffix(text1, text2)

Determine the common suffix of two strings.

diff_compute(text1, text2, checklines, deadline)

Find the differences between two texts. Assumes that the texts do not

diff_fromDelta(text1, delta)

Given the original text1, and an encoded string which describes the operations required to transform text1 into text2, compute the full diff.

diff_halfMatch(text1, text2)

Do the two texts share a substring which is at least half the length of the longer text? This speedup can produce non-minimal diffs.

diff_levenshtein(diffs)

Compute the Levenshtein distance; the number of inserted, deleted or substituted characters.

diff_lineMode(text1, text2, deadline)

Do a quick line-level diff on both strings, then rediff the parts for

diff_linesToChars(text1, text2)

Split two texts into an array of strings.

diff_main(text1, text2[, checklines, deadline])

Find the differences between two texts. Simplifies the problem by

diff_prettyHtml(diffs)

Convert a diff array into a pretty HTML report.

diff_text1(diffs)

Compute and return the source text (all equalities and deletions).

diff_text2(diffs)

Compute and return the destination text (all equalities and insertions).

diff_toDelta(diffs)

Crush the diff into an encoded string which describes the operations required to transform text1 into text2.

diff_xIndex(diffs, loc)

loc is a location in text1, compute and return the equivalent location in text2.

match_alphabet(pattern)

Initialise the alphabet for the Bitap algorithm.

match_bitap(text, pattern, loc)

Locate the best instance of 'pattern' in 'text' near 'loc' using the Bitap algorithm.

match_main(text, pattern, loc)

Locate the best instance of 'pattern' in 'text' near 'loc'.

patch_addContext(patch, text)

Increase the context until it is unique, but don't let the pattern expand beyond Match_MaxBits.

patch_addPadding(patches)

Add some padding on text start and end so that edges can match something.

patch_apply(patches, text)

Merge a set of patches onto the text.

patch_deepCopy(patches)

Given an array of patches, return another array that is identical.

patch_fromText(textline)

Parse a textual representation of patches and return a list of patch objects.

patch_make(a[, b, c])

Compute a list of patches to turn text1 into text2.

patch_splitMax(patches)

Look through the patches and break up any which are longer than the maximum limit of the match algorithm.

patch_toText(patches)

Take a list of patches and return a textual representation.

Attributes

BLANKLINEEND

BLANKLINESTART

DIFF_DELETE

DIFF_EQUAL

DIFF_INSERT

diff_bisect(text1, text2, deadline)
Find the ‘middle snake’ of a diff, split the problem in two

and return the recursively constructed diff. See Myers 1986 paper: An O(ND) Difference Algorithm and Its Variations.

Parameters
  • text1 – Old string to be diffed.

  • text2 – New string to be diffed.

  • deadline – Time at which to bail if not yet complete.

Returns

Array of diff tuples.

diff_bisectSplit(text1, text2, x, y, deadline)

Given the location of the ‘middle snake’, split the diff in two parts and recurse.

Parameters
  • text1 – Old string to be diffed.

  • text2 – New string to be diffed.

  • x – Index of split point in text1.

  • y – Index of split point in text2.

  • deadline – Time at which to bail if not yet complete.

Returns

Array of diff tuples.

diff_charsToLines(diffs, lineArray)

Rehydrate the text in a diff from a string of line hashes to real lines of text.

Parameters
  • diffs – Array of diff tuples.

  • lineArray – Array of unique strings.

diff_cleanupEfficiency(diffs)

Reduce the number of edits by eliminating operationally trivial equalities.

Parameters

diffs – Array of diff tuples.

diff_cleanupMerge(diffs)

Reorder and merge like edit sections. Merge equalities. Any edit section can move as long as it doesn’t cross an equality.

Parameters

diffs – Array of diff tuples.

diff_cleanupSemantic(diffs)

Reduce the number of edits by eliminating semantically trivial equalities.

Parameters

diffs – Array of diff tuples.

diff_cleanupSemanticLossless(diffs)

Look for single edits surrounded on both sides by equalities which can be shifted sideways to align the edit to a word boundary. e.g: The c<ins>at c</ins>ame. -> The <ins>cat </ins>came.

Parameters

diffs – Array of diff tuples.

diff_commonOverlap(text1, text2)

Determine if the suffix of one string is the prefix of another.

Parameters
  • string. (text2 Second) –

  • string.

Returns

The number of characters common to the end of the first string and the start of the second string.

diff_commonPrefix(text1, text2)

Determine the common prefix of two strings.

Parameters
  • text1 – First string.

  • text2 – Second string.

Returns

The number of characters common to the start of each string.

diff_commonSuffix(text1, text2)

Determine the common suffix of two strings.

Parameters
  • text1 – First string.

  • text2 – Second string.

Returns

The number of characters common to the end of each string.

diff_compute(text1, text2, checklines, deadline)
Find the differences between two texts. Assumes that the texts do not

have any common prefix or suffix.

Parameters
  • text1 – Old string to be diffed.

  • text2 – New string to be diffed.

  • checklines – Speedup flag. If false, then don’t run a line-level diff first to identify the changed areas. If true, then run a faster, slightly less optimal diff.

  • deadline – Time when the diff should be complete by.

Returns

Array of changes.

diff_fromDelta(text1, delta)

Given the original text1, and an encoded string which describes the operations required to transform text1 into text2, compute the full diff.

Parameters
  • text1 – Source string for the diff.

  • delta – Delta text.

Returns

Array of diff tuples.

Raises

ValueError – If invalid input.

diff_halfMatch(text1, text2)

Do the two texts share a substring which is at least half the length of the longer text? This speedup can produce non-minimal diffs.

Parameters
  • text1 – First string.

  • text2 – Second string.

Returns

Five element Array, containing the prefix of text1, the suffix of text1, the prefix of text2, the suffix of text2 and the common middle. Or None if there was no match.

diff_levenshtein(diffs)

Compute the Levenshtein distance; the number of inserted, deleted or substituted characters.

Parameters

diffs – Array of diff tuples.

Returns

Number of changes.

diff_lineMode(text1, text2, deadline)
Do a quick line-level diff on both strings, then rediff the parts for

greater accuracy. This speedup can produce non-minimal diffs.

Parameters
  • text1 – Old string to be diffed.

  • text2 – New string to be diffed.

  • deadline – Time when the diff should be complete by.

Returns

Array of changes.

diff_linesToChars(text1, text2)

Split two texts into an array of strings. Reduce the texts to a string of hashes where each Unicode character represents one line.

Parameters
  • text1 – First string.

  • text2 – Second string.

Returns

Three element tuple, containing the encoded text1, the encoded text2 and the array of unique strings. The zeroth element of the array of unique strings is intentionally blank.

diff_main(text1, text2, checklines=True, deadline=None)
Find the differences between two texts. Simplifies the problem by

stripping any common prefix or suffix off the texts before diffing.

Parameters
  • text1 – Old string to be diffed.

  • text2 – New string to be diffed.

  • checklines – Optional speedup flag. If present and false, then don’t run a line-level diff first to identify the changed areas. Defaults to true, which does a faster, slightly less optimal diff.

  • deadline – Optional time when the diff should be complete by. Used internally for recursive calls. Users should set DiffTimeout instead.

Returns

Array of changes.

diff_prettyHtml(diffs)

Convert a diff array into a pretty HTML report.

Parameters

diffs – Array of diff tuples.

Returns

HTML representation.

diff_text1(diffs)

Compute and return the source text (all equalities and deletions).

Parameters

diffs – Array of diff tuples.

Returns

Source text.

diff_text2(diffs)

Compute and return the destination text (all equalities and insertions).

Parameters

diffs – Array of diff tuples.

Returns

Destination text.

diff_toDelta(diffs)

Crush the diff into an encoded string which describes the operations required to transform text1 into text2. E.g. =3 -2 +ing -> Keep 3 chars, delete 2 chars, insert ‘ing’. Operations are tab-separated. Inserted text is escaped using %xx notation.

Parameters

diffs – Array of diff tuples.

Returns

Delta text.

diff_xIndex(diffs, loc)

loc is a location in text1, compute and return the equivalent location in text2. e.g. “The cat” vs “The big cat”, 1->1, 5->8

Parameters
  • diffs – Array of diff tuples.

  • loc – Location within text1.

Returns

Location within text2.

match_alphabet(pattern)

Initialise the alphabet for the Bitap algorithm.

Parameters

pattern – The text to encode.

Returns

Hash of character locations.

match_bitap(text, pattern, loc)

Locate the best instance of ‘pattern’ in ‘text’ near ‘loc’ using the Bitap algorithm.

Parameters
  • text – The text to search.

  • pattern – The pattern to search for.

  • loc – The location to search around.

Returns

Best match index or -1.

match_main(text, pattern, loc)

Locate the best instance of ‘pattern’ in ‘text’ near ‘loc’.

Parameters
  • text – The text to search.

  • pattern – The pattern to search for.

  • loc – The location to search around.

Returns

Best match index or -1.

patch_addContext(patch, text)

Increase the context until it is unique, but don’t let the pattern expand beyond Match_MaxBits.

Parameters
  • patch – The patch to grow.

  • text – Source text.

patch_addPadding(patches)

Add some padding on text start and end so that edges can match something. Intended to be called only from within patch_apply.

Parameters

patches – Array of Patch objects.

Returns

The padding string added to each side.

patch_apply(patches, text)

Merge a set of patches onto the text. Return a patched text, as well as a list of true/false values indicating which patches were applied.

Parameters
  • patches – Array of Patch objects.

  • text – Old text.

Returns

Two element Array, containing the new text and an array of boolean values.

patch_deepCopy(patches)

Given an array of patches, return another array that is identical.

Parameters

patches – Array of Patch objects.

Returns

Array of Patch objects.

patch_fromText(textline)

Parse a textual representation of patches and return a list of patch objects.

Parameters

textline – Text representation of patches.

Returns

Array of Patch objects.

Raises

ValueError – If invalid input.

patch_make(a, b=None, c=None)

Compute a list of patches to turn text1 into text2. Use diffs if provided, otherwise compute it ourselves. There are four ways to call this function, depending on what data is available to the caller: Method 1: a = text1, b = text2 Method 2: a = diffs Method 3 (optimal): a = text1, b = diffs Method 4 (deprecated, use method 3): a = text1, b = text2, c = diffs

Parameters
  • a – text1 (methods 1,3,4) or Array of diff tuples for text1 to text2 (method 2).

  • b – text2 (methods 1,4) or Array of diff tuples for text1 to text2 (method 3) or undefined (method 2).

  • c – Array of diff tuples for text1 to text2 (method 4) or undefined (methods 1,2,3).

Returns

Array of Patch objects.

patch_splitMax(patches)

Look through the patches and break up any which are longer than the maximum limit of the match algorithm. Intended to be called only from within patch_apply.

Parameters

patches – Array of Patch objects.

patch_toText(patches)

Take a list of patches and return a textual representation.

Parameters

patches – Array of Patch objects.

Returns

Text representation of patches.