Here's my guide to what I think the perfect code golf / competitive
programming website would look like, based on experience at PPCG and
Anarchy Golf.


Challenges and solutions
========================

The site consists of three main types of posts: challenges, solutions,
and notes.  Solutions can be attached to challenges; notes can be
attached to anything.

Challenges and solutions each normally consist of code (in a supported
language, specified along with the code); in the simplest case (that
of code-golf challenges), a challenge is a program, and a solution is
another program with the same functionality.  (Design considerations:
having the spec written as a program makes it objective / encourages
thinking about corner cases, gets rid of "throwaway problems" which
don't have any work put into them, and makes automatic marking
possible.)  It's possible that we'll also want "freeform challenges"
where the challenges and answer are arbitrary text and the correctness
of the answers is assessed by hand.  There are other possible victory
conditions too (such as PPCG's `code-challenge`, `cops-and-robbers`,
and the like).

Challenges also contain a specification of the expected form of input
to a program.  In most cases, this will be expressed in JSON; each
supported language has a "standard wrapper" that takes the JSON and
converts it into whatever the most appropriate form for that language
is.  (This is intended to end debates about input formats by allowing
each language to specify its own, but prevent the dubious case in
which the choice of input format is used to convey information; it
also ends program vs. function vs. snippet arguments as each language
can have the most convenient general form in the wrapper.)  When a
solution is submitted to a challenge, the solution is tried with some
randomly chosen inputs, plus any testcases specified as notes on the
challenge (presumably because those cases are particularly hard to get
right or otherwise interesting).  The solution is rejected unless all
the cases have the same output (after conversion by the language's
wrapper) as the original program did, and the submitter informed about
the case that failed.

If a user suspects that a solution is incorrect, they can submit their
own testcase as a note to it (or to the challenge, if it's likely to
be of value to more than one solution).  If that testcase fails, the
solution is automatically tagged as `incorrect`, is placed on a
deletion timer (being deleted after, say, a week if it's not fixed in
the meantime), and its author notified so that they can fix it.  (Even
if it succeeds it's probably still worth having it around so that
other people can see that it works.)  It's also possible to request
that a test case be re-run, in order to prevent people submitting
answers that depend on the date or similar nondeterministic values.

Challenges initially start in a "sandbox" state (with a `sandbox`
tag), allowing them to be freely edited, but preventing submission of
solutions.  Once the `sandbox` tag is voted off the challenge, it
becomes open for solutions but the challenge itself can now no longer
be edited.  However, notes to it can be, and those would be used to
express, say, the specification in English (rather than programming
language).  Voting off the `sandbox` tag is only recommended after
testing the challenge specification to ensure that it's correct (in
that it implements the same thing that a natural-language version of
the challenge would), clear / easy to read, and covers all the
possibilities that it'd be expected to.  (A challenge specification is
meant to be clear rather than golfed; golfing it is actively bad in
this case.)

For each user, solutions to challenges are initially hidden (but can
be shown on request); they're also hidden by default for logged-out
users, with an option to show them.  This lets people attempt
challenges spoiler-free and try to tie each others' scores.  Solutions
can't be modified once posted, but the same user can submit multiple
solutions even if they're very similar, and users can submit solutions
which are identical to or trivial improvements of others' solutions
(this is kind-of implied by hiding others' solutions, as there'd be no
way to check for duplicates).  Identical solutions are merged
(creating a combined authors list for a solution showing all the
people who had submitted it), although authors that were spoiled when
they submitted the solution are marked (Anarchy Golf uses red for
this), as it's possible that they might have copied the ideas from
someone else's solution rather than coming up with it themself.  (Of
course, you could cheat using a second account / logging out, but I
imagine few people would do that and it might be possible to tell if
someone was doing it persistently.)  When solutions are very similar
(e.g. as measured by Levenshtein difference), only the best (based on
the challenge's objective scoring criterion) is shown, the others are
hidden by default (but can be displayed on request, and aren't hidden
to their author even by default).


Tags
====

Every post can have a number of tags attached.  Each tag has a score,
determining how well it fits.  The creator of a post can set its
initial set of tags; each of these starts with a score of 3.  Users
can also vote on tags they think should be on the post; any tag
(whether it's on a post or not) can be incremented or decremented with
respect to that post, with the tag being added to the post at 3 and
removed at 0.  (Tags can be downvoted into the negatives, but that
hardly matters; its main purpose is to make it harder to add the tag
in the case that there's a dispute, to avoid the oscillation problems
seen with, e.g., Stack Exchange's close votes.)

The voting guidelines for tags are that users should attempt to vote a
tag on a post up to a value that depends on how well the post fits the
tag (based on comparison with other similar posts, e.g. other
solutions to the same challenge).  So people should become more
reluctant to increment a tag the higher the tag is currently voted;
when it's at 4 or 5 or so incrementing almost blindly for any post
that it fits would make sense, at much higher values like 20 you'd
want to increment only if the post was an exemplary example of the
tag.  It's even defensible to decrement a tag that fits, if you think
it's been voted out of proportion to the post's actual worth.

Tags are used to help people find challenges and solutions they're
interested in, and to rate the "subjectively" best solutions.  In
particular, solutions are tagged according to the techniques they use,
e.g. a `cheat` tag on a solution means that the solution is exploiting
a bug or other deficiency in the challenge specification, a `builtin`
tag would mean that the solution solves the core of the challenge
using a language feature designed specifically for the purpose, and
the like.  Other solution tags would involve things like using a more
efficient algorithm than the shortest, showing off an alternative
technique, etc..  It's also possible to imagine tags like `slow` that
show that an answer is valid but doesn't run fast enough to test; this
might allow an answer to be submitted even if the system can't verify
it as correct (with a matching tag `incorrect` or the like that acts
the same way as a failed testcase if voted onto the post).  Finally,
some purely subjective tags like `interesting` are provided to let
people express their opinions on a post.

It's probably clearest UI-wise for challenge tags, solution tags, and
note tags to all be separate sets.  This implies some sort of tag
creation process.  To start with, this could be done by moderators on
advice from meta-discussion forums, and a more automated process
(e.g. voting on them) could be added later.

In order to avoid problems seen in practice with PPCG voting systems,
solution tags cannot be voted on when the solutions are sorted by a
subjective sort order (such as score in a given tag).  An objective
sort orders (program length, newest submission, etc.) must be used to
enable the ability to vote on tags (other than moderation tags).  This
means that a good, older answer will have less of an advantage over an
even better, newer answer.

In addition to being used for sorting and searching, tags are also
used to give users arbitrary numbers (reputation-style) to compete
over: the total score of all a user's posts in each tag is recorded on
a page for that user.  This doesn't directly do anything (although we
might have, e.g., leaderboards).  Unlocking site features
automatically upon gaining reputation turned out to cause huge trouble
for PPCG, because often things like, e.g., extra moderator tools
actually make usage of the site worse for people who have them, giving
a perverse incentive to not want to contribute to the site.  (It might
be viable to allow people to apply for extra privileges, if we even
have any, upon getting particular scores, though.)

There are also some tag-like constructions that are added
automatically to posts and can't be voted on; things like length of a
program and the language it's written in are tag-like from a searching
and sorting point of view, but there's obviously no point in voting on
objective things like that.  These pseudo-tags are thus automatically
added to a post and cannot be voted on, but otherwise act like tags.
These tags may be scored in unusual units (e.g. bytes), but are
converted into numerical scores with higher-is-better for display on a
score page (e.g. the number of users you outgolfed + a constant).

Finally, tags are also used for moderation purposes.  A few "red
tags", like `spam`, `offtopic`, and `duplicate`, exist; if they're
voted onto a post, it causes the post to be hidden from anonymous
users and some sort of moderator alert; if they're voted substantially
higher (maybe to somewhere in the 6-8 range), the post gets deleted.
These tags apply to all posts, and work as the easiest way to
self-moderate the site.  (If a *user* gets a very high score in a red
tag, that may be evidence to do something about the account!)  Tags
with a special meaning to the system, like `sandbox` and `testcase`,
are also likely to have a colour of their own to indicate this (and
there are no immediate implications from having them on your userpage;
who knows, it might be nice to be known as someone who writes really
good test cases.)


Notes
=====

Challenges and solutions serve to contain the actual "competition"
part of the site from an objective point of view.  However, PPCG has
shown that golfing sites also benefit from educational aspects,
discussion, and the like.

Notes are attachments to other sorts of posts that add additional
information to them, normally (but not always) in natural language.
Here are some examples of things that would be notes:

  * Natural-language specifications of a challenge (to accompany the
    machine-readable specification that's used for scoring purposes);
    many people will benefit from additional background, an easier to
    read spec, and the like.  This is particularly important if the
    challenge is something like "replicate the behaviour of this
    obscure command from an esolang" and the challenge itself is
    specified simply by giving that command in that esolang.

  * Hand-picked test cases for a challenge that show off corner cases
    or other cases that could do with checking (if tagged `testcase`,
    these have a special meaning to the system, in that the author
    specifies the input but the system shows the matching output, and
    they're also used to verify the correctness of answers).

  * Explanations of how a solution works.  Note that having these
    separate from the solution itself means that people can explain
    other peoples' solutions without awkwardness (on PPCG, if you want
    to explain someone else's solution, neither an edit nor a comment
    really does the job well).
    
  * Explaining the provenance of a challenge or solution; e.g. if
    multiple users work on a solution together, they can all submit
    the solution, then add a note to it explaining that it was a
    collaborative effort.  (All the users submitting the same solution
    unspoiled would give them all full credit for the tag score in it,
    because the solutions would be merged into a single post.)
  
  * Anything that people use comments for on PPCG, regardless of
    whether it's something that Stack Exchange would approve of:
    questions about the post, suggestions for other topics to cover,
    etc..

The type of a note is typically indicated using tags (although a note
can be anything, really, and can be voted as offtopic if it's not the
sort of thing that fits the post).  This means that tags like
`explanation` can be voted on just like solution tags, and a user who
submits a lot of good explanations (to their own answers or someone
else's) will get a high score in them.

Notes can be edited freely by their author.  They can't be edited by
other users, but you can submit a note to the original note requesting
changes, or your own competing note to the same post (e.g. an
alternative/competing explanation).