Real-time editing mode - LocalWiki Technical Development

This may not really be necessary. I think if we can make the normal wiki process work well, and work better, we can put the off for a while (or perhaps ever).

It's worth thinking about, though. Think email vs. IM -- different usages. Wiki (non-realtime) vs. Realtime wiki.

We should focus on non-realtime editing for now. Improving that process will do wonders. See showing changes.

For now, we are focusing on making the best possible wiki experience independent of real-time editing. Things like the perfect GUI editor, annotation, better diff viewing, should take priority -- they will actually improve quality. Real time editing may not be as useful. Really.

It's not clear that real-time editing would appreciably improve editing quality, though it would likely improve the ability of a couple editors, working together, to collaborate (via chat [google django + comet]). It's worth exploring and likely implementing, but I wouldn't say it's essential.

After we've completed the bulk of work on the base wiki (perfectly working GUI editor, annotations, amazing diff viewing) we can think about adding this stuff.

.. . . . . .

. . .

then…

After basic release made:

We focus on real-time editing as a means to further the wiki process -- collaboration on simple interlinked web pages.

would be amazing to show a prompt, "edit this in real time with <conflictor>" -- on an edit conflict. Or just when someone else starts editing the page you're editing.

http://code.google.com/p/google-mobwrite/ !!!

mobwrite doesn't have a rich-text editor built in. we'll have to hack it to send / receive HTML properly. might be easy. synchroedit has a rich text interface but it's all java and stuff, so we should only use it for reference.

See author's blog post (under "efficient change control of xml documents" re: rich text). read his paper on mobwrite's strategy and see "All the examples in this paper have shown synchronization of plain text. Differential synchronization can handle any content (plain text, rich text, bitmaps, vector graphics, etc) as long as a difference algorithm and a fuzzy patch algorithm is available for the content."

There's an 'iframe' branch that highlights changes by users. collaboration is still in plain text.

Pretty sure the moin dudes are also using plain text editing in their branch with this stuff. Check it out, obviously.

So the best way to solve the problem may be implementing the mentioned Sebastian Rönnau's diff & merge/patch algos (need PDF!!! - here it is File(p2105-ronnau.pdf)? ) for XML on top of mobwrite. There's more info on plaintext here (diff-match-patch project). Looks like we want a tree-based diff, match and patch.

"Extending MobWrite is really easy. Include diff_match_patch.js, mobwrite_core.js and a new JS file that has get, set, and patch methods for the object to sync. Plus the constructor and share handler. mobwrite_form.js has lots of examples. "

Some dude says "Another approach would be simply to cleanup the HTML after applying the patches (e.g. BeatifulSoup? can do this)." This approach might work well enough! Let's try this first and see how much it sucks.

look at differential diff/patch. see also this and this video

At some point we should explore a real-time editing mode. I think the best way to do this is to allow a given editor, when in the editing area, to invite others to edit in real-time on the same page. Saves, etc would work the same way. Like, a link under the editor that says something like "invite others to edit this draft with you." The key is we need to make this obvious, from a UI perspective -- that they are just inviting others to edit this draft in real-time, not this document. Wikis are living things, not documents (e.g. Google Docs) so this is harder to capture conceptually.

Look at that paper that Evan told me about.

Needs to be super good. Highlighting, etc. Better than etherpad.

Think about integration ==

In addition to the 'playback' and so forth - in Showing changes - we should think about recording these real-time edits

We could store a bundle of real time changes - tagged by user - associated with an actual concretely saved edit. Then when someone goes into the playback / showing changes mode they can also see real-time edits between concrete changes.

[ V_1 ] --> [ V_2 ] --> [ V_3 ] --> [ V_4]
         |           |           |
         |           |           |
     [ r edits ]   [ r edits ]  [ r edits ]

Should this be in our first release?

Depends on time, of course. There's nothing out there, in the open source wiki world, that does this.

Etherpad

We should look at the etherpad source: http://code.google.com/p/etherpad/. It's Java stuff. And, again, their algorithm is just for plain text so it doesn't really help us much more than using mobwrite does. Mobwrite is probably a much better starting point for writing our real time editor.

We should model the UI on etherpad. It has colors for each user. It also has a good "time slider" mode.

general UI (etherpad)

Time slider UI (etherpad)

Note that 'saved' revisions are marked as stars on this timeline. We should something similar with our normal wiki saves.

I also think that the chat is essential for real-time editing.

Another:

Differences

We have to recognize that wiki pages are longer-living than etherpad-type things. Etherpad is meant for collaboration on short-lived documents. Wiki pages are timeless, edited constantly, etc. In our case, we want to allow for real-time collaboration on pages as a distinct mode, but not have that be the dominant method of editing.

"invite" others to edit your version of the page with you. We should let ppl specify other wiki editors to collaborate in real time with them, if they want. Also a link for others to use, etc.

The real-time version isn't saved until it's explicitly saved.

We should think about whether or not we even want to save these real-time interactions

-- etherpad saves all the edits from everyone. Is this even useful? Wouldn't it be better to just allow collaboration in real time but not save the actual pre-save material?

UI integration idea

would be amazing to show a prompt, "edit this in real time with <conflictor>" -- on an edit conflict

Feasibility

If the "simply clean up the HTML" approach -- and some research into that and other methods -- worked out then this looks very feasible from a wiki-integration perspective. It would be amazing to have if we could get it right.

Hopefully this is possible without a shitload of work.

Misc notes

Comet stuff. http://orbited.org/ - has websockets API that does comet as fallback, check it out. http://clemesha.org/blog/2009/dec/17/realtime-web-apps-python-django-orbited-twisted/. also worth looking at: gunicorn, evserver (return an iterator instead of string in HttpResponse).
- http://blog.gevent.org/2009/10/10/simpler-long-polling-with-django-and-gevent/ -- check out gevent and examples (webchat example).
OT - operational transform http://en.wikipedia.org/wiki/Operational_transformation. fancy language for what "differential transformation" in the context of mobwrite. mobwrite is a 'patch based' approach. but it works pretty well if everything in the material is a single unit (just text, not tree-structured xml).

ways around this: beautifulsoup cleanup, using a unique unicode symbol for every html element.

Guy on hacker news says:

OT is a dead end. The problems they struggle with could be trivially bruteforced by employing unique symbol identifiers. http://bouillon.math.usu.ru/articles/ctre.pdf

Is this symbol-based approach similar to what the mobwrite guy says about using unicode symbols for all HTML tags? Might be a rad way to do it.

diff-match-patch & HTML

from: http://code.google.com/p/google-diff-match-patch/wiki/Plaintext

The diff, match and patch algorithms in this library are plain text only. Attempting to feed HTML, XML or some other structured content through them may result in problems. Consider the case where a series of patches are applied to HTML content on a best-effort basis. One could be left with a <B> tag that starts but doesn't end, text falling between a </TD> and a <TD>, or a syntactically invalid tag missing a bracket.

The correct solution is to use a tree-based diff, match and patch. These employ totally different algorithms. I'm afraid I can't help you there.

Doing it anyway

However, depending on the task, there are sometimes some interesting ways to use text-based algorithms on structured content.

One method is to strip the tags from the HTML using a simple regex or node-walker. Then diff the HTML content against the text content. Don't perform any diff cleanups. This diff enables one to map character positions from one version to the other (see the diff_xIndex function). After this, one can apply all the patches one wants against the plain text, then safely map the changes back to the HTML. The catch with this technique is that although text may be freely edited, HTML tags are immutable.

Another method is to walk the HTML and replace every opening and closing tag with a Unicode character. Check the Unicode spec for a range that is not in use. During the process, create a hash table of Unicode characters to the original tags. The result is a block of text which can be patched without fear of inserting text inside a tag or breaking the syntax of a tag. One just has to be careful when reconverting the content back to HTML that no closing tags are lost.

Add tags