Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- /**
- Idea is we want to "attach" something ( annotation, edit, image, link, whatever )
- to a particular piece of text that is not necessarily defined by an element.
- In other words, some free form text. Whether this text comes from HTML, or a text file
- is unimportant. The point is to find this attachment point even when:
- - the order of paragraphs is altered
- - the order of sentences in a paragraph is altered
- - the order of words in a sentence is altered
- And we would like to still find the attachment point with high probability when:
- - the words before, after and within the attachment point have changed, been deleted or been added to.
- The basic idea is that in order to create a memory
- of a certain location in the source we extract multiple layers
- of features, or patterns, or signals
- And our 'matching function' which ranks candidate attachment points
- by how closely we believe them to match the intended remembered point
- is a combination of scores derived from these signals.
- Some ideas I have for signals now are:
- - bag of words / word vector, take inner product to produce score
- - bag of letter trigrams / trigram vector, take inner product to produce score
- - exact match / 0 or 1 for mismatch or exact match to produce score
- - edit distance / alignment to produce score
- - word bigram vector, take inner product to produce score
- - paragraph index, symmetric difference to produce score
- - sentence index relative to document, symmetric difference to produce score
- - sentence index relative to paragraph, symmetric difference to produce score
- - first, middle or last, sentence, 0 or 1 to produce score
- - first, middle or last, paragraph, 0 or 1 to produce score
- - sentence prior, sentence after
- So to make a memory, we record the exact text from the sentence we are memorizing
- We also record the sentence and paragraph indices, and the values for features we cannot
- compute from the extracted text ourselves ( first, middle, last; sentence prior and sentence after )
- And then to compute a match we do the following algorithm:
- - find exact match for extract, if there is only 1, we find, otherwise continue
- - compute values for all the signals for the extracted sentence, and compute values for all the signals from every other sentence,
- possibly weighting each signal, and then compute match scores between the values of signals for the extracted sentence,
- and values of signals for all other sentences. Rank these, break aggregate score ties by earliest precedence in the document.
- - attempt to apply the edit, annotation, modification whatever to the found highest ranked sentence, and if it works, say:
- "The sentence we're editing has changed, and this may not be the sentence we were looking for. Click here to see the next 10 best
- candidates for the sentence we were looking for."
- if it doesn't work, attempt to apply it to each of the next 10 best matches. If it works, then display the same message as above.
- If it doesn't work, apply it anyway to the top ranked sentences and leave a note that says,
- "The sentence we're editing has changed or moved, and we are not sure if this is the sentence we were looking for. Sorry.
- This can happen when the document was edited after we marked it. Click here to see the next 20 best candidates
- for the sentence we were looking for."
- // we break "sentences" on these marks
- const SEN_MARK = {
- en: [ ".", "'", '"', ":", ";", "!", "?", "()", "[]", "“”", "‘’" ],
- zh: [ "。", "「」", "﹁ ﹂", ";", ":", "!", "?", "()", "[]", "【】", "“”", "‘’", "《》", "〈〉"],
- es: [ ".", "'", '"', ":", ";", "¡!", "¿?", " "()", "[]", "⟨⟩", "“”", "‘’", "‹›", "«»" ],
- hi: [ "|", ";", "?", "!", "”" ],
- ar: [ ".", "؟", ":", "“”" ]
- };
- The aim is to approach the best possible we can do without understanding semantics. `
- **/
- function remember( letter_index_from_source, sentence_text, source ) {
- }
- function find( sentence_text, source_dependent_scores, source ) {
- }
Add Comment
Please, Sign In to add comment