[DISCUSS] Update Roadmap

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Update Roadmap

moon
Administrator
Hi Zeppelin users and developers,

The roadmap we have published at
https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
is almost 9 month old, and it doesn't reflect where the community goes
anymore. It's time to update.

Based on mailing list, jira issues, pullrequests, feedbacks from users,
conferences and meetings, I could summarize the major interest of users and
developers in 7 categories. Enterprise ready, Usability improvement,
Pluggability, Documentation, Backend integration, Notebook storage, and
Visualization.

And i could list related subjects under each categories.

   - Enterprise ready
      - Authentication
         - Shiro authentication ZEPPELIN-548
         <https://issues.apache.org/jira/browse/ZEPPELIN-548>
      - Authorization
         - Notebook authorization PR-681
         <https://github.com/apache/incubator-zeppelin/pull/681>
      - Security
      - Multi-tenancy
      - Stability
   - Usability Improvement
      - UX improvement
      - Better Table data support
         - Download data as csv, etc PR-725
         <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
         <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
         <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
         <https://github.com/apache/incubator-zeppelin/pull/89>
         - Featureful table data display (pagenation, etc)
      - Pluggability ZEPPELIN-533
   <https://issues.apache.org/jira/browse/ZEPPELIN-533>
      - Pluggable visualization
      - Dynamic Interpreter, notebook, visualization loading
      - Repository and registry for pluggable components
   - Improve documentation
      - Improve contents and readability
      - more tutorials, examples
   - Interpreter
      - Generic JDBC Interpreter
      - (spark)R Interpreter
      - Cluster manager for interpreter (Proposal
      <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
      )
      - more interpreters
   - Notebook storage
      - Versioning ZEPPELIN-540
      <http://issues.apache.org/jira/browse/ZEPPELIN-540>
      - more notebook storages
   - Visualization
      - More visualizations PR-152
      <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
      <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
      <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
      <https://github.com/apache/incubator-zeppelin/pull/321>
      - Customize graph (show/hide label, color, etc)


It will help anyone quickly get overall interest of project and the
direction. And based on this roadmap, we can discuss and re-define the next
release 0.6.0 scope and it's schedule.

What do you think? Any feedback would be appreciated.

Thanks,
moon
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Update Roadmap

DuyHai Doan
It's a great update Moon.

 Monday I'll give a talk at Voxxed Days Vienna about Zeppelin, your email
will be helpful to give some hints about the future of Zeppelin



On Sat, Feb 27, 2016 at 9:48 PM, moon soo Lee <[hidden email]> wrote:

> Hi Zeppelin users and developers,
>
> The roadmap we have published at
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
> is almost 9 month old, and it doesn't reflect where the community goes
> anymore. It's time to update.
>
> Based on mailing list, jira issues, pullrequests, feedbacks from users,
> conferences and meetings, I could summarize the major interest of users and
> developers in 7 categories. Enterprise ready, Usability improvement,
> Pluggability, Documentation, Backend integration, Notebook storage, and
> Visualization.
>
> And i could list related subjects under each categories.
>
>    - Enterprise ready
>       - Authentication
>          - Shiro authentication ZEPPELIN-548
>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>       - Authorization
>          - Notebook authorization PR-681
>          <https://github.com/apache/incubator-zeppelin/pull/681>
>       - Security
>       - Multi-tenancy
>       - Stability
>    - Usability Improvement
>       - UX improvement
>       - Better Table data support
>          - Download data as csv, etc PR-725
>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>          <https://github.com/apache/incubator-zeppelin/pull/89>
>          - Featureful table data display (pagenation, etc)
>       - Pluggability ZEPPELIN-533
>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>       - Pluggable visualization
>       - Dynamic Interpreter, notebook, visualization loading
>       - Repository and registry for pluggable components
>    - Improve documentation
>       - Improve contents and readability
>       - more tutorials, examples
>    - Interpreter
>       - Generic JDBC Interpreter
>       - (spark)R Interpreter
>       - Cluster manager for interpreter (Proposal
>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>       )
>       - more interpreters
>    - Notebook storage
>       - Versioning ZEPPELIN-540
>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>       - more notebook storages
>    - Visualization
>       - More visualizations PR-152
>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>       <https://github.com/apache/incubator-zeppelin/pull/321>
>       - Customize graph (show/hide label, color, etc)
>
>
> It will help anyone quickly get overall interest of project and the
> direction. And based on this roadmap, we can discuss and re-define the next
> release 0.6.0 scope and it's schedule.
>
> What do you think? Any feedback would be appreciated.
>
> Thanks,
> moon
>
>
Reply | Threaded
Open this post in threaded view
|

RE: [DISCUSS] Update Roadmap

Darren Govoni
In reply to this post by moon

   
Looks fantastic moon.
Anything in the community with regards to easier debugging with specific backends? E.g. spark.
Sent from my Verizon Wireless 4G LTE smartphone

-------- Original message --------
From: moon soo Lee <[hidden email]>
Date: 02/27/2016  3:48 PM  (GMT-05:00)
To: [hidden email], [hidden email]
Subject: [DISCUSS] Update Roadmap

Hi Zeppelin users and developers,
The roadmap we have published athttps://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmapis almost 9 month old, and it doesn't reflect where the community goes anymore. It's time to update.
Based on mailing list, jira issues, pullrequests, feedbacks from users, conferences and meetings, I could summarize the major interest of users and developers in 7 categories. Enterprise ready, Usability improvement, Pluggability, Documentation, Backend integration, Notebook storage, and Visualization.
And i could list related subjects under each categories.Enterprise readyAuthentication Shiro authentication ZEPPELIN-548Authorization Notebook authorization PR-681SecurityMulti-tenancyStabilityUsability ImprovementUX improvementBetter Table data supportDownload data as csv, etc PR-725, PR-714, PR-6, PR-89Featureful table data display (pagenation, etc)Pluggability ZEPPELIN-533Pluggable visualizationDynamic Interpreter, notebook, visualization loadingRepository and registry for pluggable componentsImprove documentationImprove contents and readabilitymore tutorials, examplesInterpreterGeneric JDBC Interpreter(spark)R InterpreterCluster manager for interpreter (Proposal)more interpretersNotebook storageVersioning ZEPPELIN-540more notebook storagesVisualizationMore visualizations PR-152, PR-728, PR-336, PR-321Customize graph (show/hide label, color, etc)
It will help anyone quickly get overall interest of project and the direction. And based on this roadmap, we can discuss and re-define the next release 0.6.0 scope and it's schedule.
What do you think? Any feedback would be appreciated.
Thanks,moon

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Update Roadmap

Sourav Mazumder
In reply to this post by moon
Hi Moon,

This looks great.

My only suggestion would be to include a PR/feature - Support for Running
Concurrent paragraphs/queries in Zeppelin.

Right now if more than one user tries to run paragraphs in multiple
notebooks concurrently through a single Zeppelin instance (and single
interpreter instance) the performance is very slow. It is obvious that the
queue gets built up within the zeppelin process and interpreter process in
that scenario as the time taken to move the status from start to pending
and pending to running is very high compared to the actual running time of
a paragraph.

Without this the multi tenancy support would be meaningless as no one can
practically use it in a situation where multiple users are trying to
connect to the same instance of Zeppelin (and the related interpreter). A
possible solution would be to spawn separate instance of the same
interpreter at every notebook/user level.

Regards,
Sourav

On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <[hidden email]> wrote:

> Hi Zeppelin users and developers,
>
> The roadmap we have published at
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
> is almost 9 month old, and it doesn't reflect where the community goes
> anymore. It's time to update.
>
> Based on mailing list, jira issues, pullrequests, feedbacks from users,
> conferences and meetings, I could summarize the major interest of users and
> developers in 7 categories. Enterprise ready, Usability improvement,
> Pluggability, Documentation, Backend integration, Notebook storage, and
> Visualization.
>
> And i could list related subjects under each categories.
>
>    - Enterprise ready
>       - Authentication
>          - Shiro authentication ZEPPELIN-548
>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>       - Authorization
>          - Notebook authorization PR-681
>          <https://github.com/apache/incubator-zeppelin/pull/681>
>       - Security
>       - Multi-tenancy
>       - Stability
>    - Usability Improvement
>       - UX improvement
>       - Better Table data support
>          - Download data as csv, etc PR-725
>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>          <https://github.com/apache/incubator-zeppelin/pull/89>
>          - Featureful table data display (pagenation, etc)
>       - Pluggability ZEPPELIN-533
>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>       - Pluggable visualization
>       - Dynamic Interpreter, notebook, visualization loading
>       - Repository and registry for pluggable components
>    - Improve documentation
>       - Improve contents and readability
>       - more tutorials, examples
>    - Interpreter
>       - Generic JDBC Interpreter
>       - (spark)R Interpreter
>       - Cluster manager for interpreter (Proposal
>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>       )
>       - more interpreters
>    - Notebook storage
>       - Versioning ZEPPELIN-540
>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>       - more notebook storages
>    - Visualization
>       - More visualizations PR-152
>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>       <https://github.com/apache/incubator-zeppelin/pull/321>
>       - Customize graph (show/hide label, color, etc)
>
>
> It will help anyone quickly get overall interest of project and the
> direction. And based on this roadmap, we can discuss and re-define the next
> release 0.6.0 scope and it's schedule.
>
> What do you think? Any feedback would be appreciated.
>
> Thanks,
> moon
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Update Roadmap

Zhong Wang
This is awesome! Really glad to see that the roadmap is adjusted based on
the community's needs. One feature I hope to see in 0.6.0 is folder
support, which can benefit both "UX improvement" and "Multi-tenacy".

Zhong

On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
[hidden email]> wrote:

> Hi Moon,
>
> This looks great.
>
> My only suggestion would be to include a PR/feature - Support for Running
> Concurrent paragraphs/queries in Zeppelin.
>
> Right now if more than one user tries to run paragraphs in multiple
> notebooks concurrently through a single Zeppelin instance (and single
> interpreter instance) the performance is very slow. It is obvious that the
> queue gets built up within the zeppelin process and interpreter process in
> that scenario as the time taken to move the status from start to pending
> and pending to running is very high compared to the actual running time of
> a paragraph.
>
> Without this the multi tenancy support would be meaningless as no one can
> practically use it in a situation where multiple users are trying to
> connect to the same instance of Zeppelin (and the related interpreter). A
> possible solution would be to spawn separate instance of the same
> interpreter at every notebook/user level.
>
> Regards,
> Sourav
>
> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <[hidden email]> wrote:
>
>> Hi Zeppelin users and developers,
>>
>> The roadmap we have published at
>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>> is almost 9 month old, and it doesn't reflect where the community goes
>> anymore. It's time to update.
>>
>> Based on mailing list, jira issues, pullrequests, feedbacks from users,
>> conferences and meetings, I could summarize the major interest of users and
>> developers in 7 categories. Enterprise ready, Usability improvement,
>> Pluggability, Documentation, Backend integration, Notebook storage, and
>> Visualization.
>>
>> And i could list related subjects under each categories.
>>
>>    - Enterprise ready
>>       - Authentication
>>          - Shiro authentication ZEPPELIN-548
>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>       - Authorization
>>          - Notebook authorization PR-681
>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>       - Security
>>       - Multi-tenancy
>>       - Stability
>>    - Usability Improvement
>>       - UX improvement
>>       - Better Table data support
>>          - Download data as csv, etc PR-725
>>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>          - Featureful table data display (pagenation, etc)
>>       - Pluggability ZEPPELIN-533
>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>       - Pluggable visualization
>>       - Dynamic Interpreter, notebook, visualization loading
>>       - Repository and registry for pluggable components
>>    - Improve documentation
>>       - Improve contents and readability
>>       - more tutorials, examples
>>    - Interpreter
>>       - Generic JDBC Interpreter
>>       - (spark)R Interpreter
>>       - Cluster manager for interpreter (Proposal
>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>       )
>>       - more interpreters
>>    - Notebook storage
>>       - Versioning ZEPPELIN-540
>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>       - more notebook storages
>>    - Visualization
>>       - More visualizations PR-152
>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>       - Customize graph (show/hide label, color, etc)
>>
>>
>> It will help anyone quickly get overall interest of project and the
>> direction. And based on this roadmap, we can discuss and re-define the next
>> release 0.6.0 scope and it's schedule.
>>
>> What do you think? Any feedback would be appreciated.
>>
>> Thanks,
>> moon
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Update Roadmap

Zhong Wang
In reply to this post by Sourav Mazumder
Sourav: I think this newly merged PR can help you
https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537

On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
[hidden email]> wrote:

> Hi Moon,
>
> This looks great.
>
> My only suggestion would be to include a PR/feature - Support for Running
> Concurrent paragraphs/queries in Zeppelin.
>
> Right now if more than one user tries to run paragraphs in multiple
> notebooks concurrently through a single Zeppelin instance (and single
> interpreter instance) the performance is very slow. It is obvious that the
> queue gets built up within the zeppelin process and interpreter process in
> that scenario as the time taken to move the status from start to pending
> and pending to running is very high compared to the actual running time of
> a paragraph.
>
> Without this the multi tenancy support would be meaningless as no one can
> practically use it in a situation where multiple users are trying to
> connect to the same instance of Zeppelin (and the related interpreter). A
> possible solution would be to spawn separate instance of the same
> interpreter at every notebook/user level.
>
> Regards,
> Sourav
>
> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <[hidden email]> wrote:
>
>> Hi Zeppelin users and developers,
>>
>> The roadmap we have published at
>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>> is almost 9 month old, and it doesn't reflect where the community goes
>> anymore. It's time to update.
>>
>> Based on mailing list, jira issues, pullrequests, feedbacks from users,
>> conferences and meetings, I could summarize the major interest of users and
>> developers in 7 categories. Enterprise ready, Usability improvement,
>> Pluggability, Documentation, Backend integration, Notebook storage, and
>> Visualization.
>>
>> And i could list related subjects under each categories.
>>
>>    - Enterprise ready
>>       - Authentication
>>          - Shiro authentication ZEPPELIN-548
>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>       - Authorization
>>          - Notebook authorization PR-681
>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>       - Security
>>       - Multi-tenancy
>>       - Stability
>>    - Usability Improvement
>>       - UX improvement
>>       - Better Table data support
>>          - Download data as csv, etc PR-725
>>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>          - Featureful table data display (pagenation, etc)
>>       - Pluggability ZEPPELIN-533
>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>       - Pluggable visualization
>>       - Dynamic Interpreter, notebook, visualization loading
>>       - Repository and registry for pluggable components
>>    - Improve documentation
>>       - Improve contents and readability
>>       - more tutorials, examples
>>    - Interpreter
>>       - Generic JDBC Interpreter
>>       - (spark)R Interpreter
>>       - Cluster manager for interpreter (Proposal
>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>       )
>>       - more interpreters
>>    - Notebook storage
>>       - Versioning ZEPPELIN-540
>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>       - more notebook storages
>>    - Visualization
>>       - More visualizations PR-152
>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>       - Customize graph (show/hide label, color, etc)
>>
>>
>> It will help anyone quickly get overall interest of project and the
>> direction. And based on this roadmap, we can discuss and re-define the next
>> release 0.6.0 scope and it's schedule.
>>
>> What do you think? Any feedback would be appreciated.
>>
>> Thanks,
>> moon
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Update Roadmap

moon
Administrator
In reply to this post by Darren Govoni
We've got couple of questions in the mailing list about attaching debuggers
to the interpreter process. And i also personally got questions about
debugging interpreter process. Definitely, debugging facility can be added
to 'Interpreter' category.

Thanks for great feedback.

Thanks,
moon

On Sat, Feb 27, 2016 at 1:15 PM Darren Govoni <[hidden email]> wrote:

> Looks fantastic moon.
>
> Anything in the community with regards to easier debugging with specific
> backends? E.g. spark.
>
> Sent from my Verizon Wireless 4G LTE smartphone
>
>
> -------- Original message --------
> From: moon soo Lee <[hidden email]>
> Date: 02/27/2016 3:48 PM (GMT-05:00)
> To: [hidden email], [hidden email]
> Subject: [DISCUSS] Update Roadmap
>
> Hi Zeppelin users and developers,
>
> The roadmap we have published at
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
> is almost 9 month old, and it doesn't reflect where the community goes
> anymore. It's time to update.
>
> Based on mailing list, jira issues, pullrequests, feedbacks from users,
> conferences and meetings, I could summarize the major interest of users and
> developers in 7 categories. Enterprise ready, Usability improvement,
> Pluggability, Documentation, Backend integration, Notebook storage, and
> Visualization.
>
> And i could list related subjects under each categories.
>
>    - Enterprise ready
>       - Authentication
>          - Shiro authentication ZEPPELIN-548
>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>       - Authorization
>          - Notebook authorization PR-681
>          <https://github.com/apache/incubator-zeppelin/pull/681>
>       - Security
>       - Multi-tenancy
>       - Stability
>    - Usability Improvement
>       - UX improvement
>       - Better Table data support
>          - Download data as csv, etc PR-725
>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>          <https://github.com/apache/incubator-zeppelin/pull/89>
>          - Featureful table data display (pagenation, etc)
>       - Pluggability ZEPPELIN-533
>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>       - Pluggable visualization
>       - Dynamic Interpreter, notebook, visualization loading
>       - Repository and registry for pluggable components
>    - Improve documentation
>       - Improve contents and readability
>       - more tutorials, examples
>    - Interpreter
>       - Generic JDBC Interpreter
>       - (spark)R Interpreter
>       - Cluster manager for interpreter (Proposal
>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>       )
>       - more interpreters
>    - Notebook storage
>       - Versioning ZEPPELIN-540
>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>       - more notebook storages
>    - Visualization
>       - More visualizations PR-152
>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>       <https://github.com/apache/incubator-zeppelin/pull/321>
>       - Customize graph (show/hide label, color, etc)
>
>
> It will help anyone quickly get overall interest of project and the
> direction. And based on this roadmap, we can discuss and re-define the next
> release 0.6.0 scope and it's schedule.
>
> What do you think? Any feedback would be appreciated.
>
> Thanks,
> moon
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Update Roadmap

moon
Administrator
In reply to this post by Zhong Wang
Zhong Wang,
Right, Folder support would be quite useful. Thanks for the opinion.
Hope i can finish the work pr-190
<https://github.com/apache/incubator-zeppelin/pull/190>.

Sourav,
Regarding concurrent running, Zeppelin doesn't have limitation of run
paragraph/query concurrently. Interpreter can implement it's own scheduling
policy. For example, SparkSQL interpreter and ShellInterpreter can already
run paragraph/query concurrently.

SparkInterpreter is implemented with FIFO scheduler considering nature of
scala compiler. That's why user can not run multiple paragraph concurrently
when they work with SparkInterpreter.
But as Zhong Wang mentioned, pr-703 enables each notebook will have
separate scala compiler so paragraphs run concurrently, while they're in
different notebooks.
Thanks for the feedback!

Best,
moon

On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <[hidden email]> wrote:

> Sourav: I think this newly merged PR can help you
> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>
> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
> [hidden email]> wrote:
>
>> Hi Moon,
>>
>> This looks great.
>>
>> My only suggestion would be to include a PR/feature - Support for Running
>> Concurrent paragraphs/queries in Zeppelin.
>>
>> Right now if more than one user tries to run paragraphs in multiple
>> notebooks concurrently through a single Zeppelin instance (and single
>> interpreter instance) the performance is very slow. It is obvious that the
>> queue gets built up within the zeppelin process and interpreter process in
>> that scenario as the time taken to move the status from start to pending
>> and pending to running is very high compared to the actual running time of
>> a paragraph.
>>
>> Without this the multi tenancy support would be meaningless as no one can
>> practically use it in a situation where multiple users are trying to
>> connect to the same instance of Zeppelin (and the related interpreter). A
>> possible solution would be to spawn separate instance of the same
>> interpreter at every notebook/user level.
>>
>> Regards,
>> Sourav
>>
>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <[hidden email]> wrote:
>>
>>> Hi Zeppelin users and developers,
>>>
>>> The roadmap we have published at
>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>> is almost 9 month old, and it doesn't reflect where the community goes
>>> anymore. It's time to update.
>>>
>>> Based on mailing list, jira issues, pullrequests, feedbacks from users,
>>> conferences and meetings, I could summarize the major interest of users and
>>> developers in 7 categories. Enterprise ready, Usability improvement,
>>> Pluggability, Documentation, Backend integration, Notebook storage, and
>>> Visualization.
>>>
>>> And i could list related subjects under each categories.
>>>
>>>    - Enterprise ready
>>>       - Authentication
>>>          - Shiro authentication ZEPPELIN-548
>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>       - Authorization
>>>          - Notebook authorization PR-681
>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>       - Security
>>>       - Multi-tenancy
>>>       - Stability
>>>    - Usability Improvement
>>>       - UX improvement
>>>       - Better Table data support
>>>          - Download data as csv, etc PR-725
>>>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>          - Featureful table data display (pagenation, etc)
>>>       - Pluggability ZEPPELIN-533
>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>       - Pluggable visualization
>>>       - Dynamic Interpreter, notebook, visualization loading
>>>       - Repository and registry for pluggable components
>>>    - Improve documentation
>>>       - Improve contents and readability
>>>       - more tutorials, examples
>>>    - Interpreter
>>>       - Generic JDBC Interpreter
>>>       - (spark)R Interpreter
>>>       - Cluster manager for interpreter (Proposal
>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>       )
>>>       - more interpreters
>>>    - Notebook storage
>>>       - Versioning ZEPPELIN-540
>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>       - more notebook storages
>>>    - Visualization
>>>       - More visualizations PR-152
>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>       - Customize graph (show/hide label, color, etc)
>>>
>>>
>>> It will help anyone quickly get overall interest of project and the
>>> direction. And based on this roadmap, we can discuss and re-define the next
>>> release 0.6.0 scope and it's schedule.
>>>
>>> What do you think? Any feedback would be appreciated.
>>>
>>> Thanks,
>>> moon
>>>
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Update Roadmap

Zhong Wang
In reply to this post by moon
Darren & moon: I am not sure whether this pr can help your debugging issue:
https://github.com/apache/incubator-zeppelin/pull/749. I have no problem
attaching to the interpreter process with this fix.

Zhong

On Sun, Feb 28, 2016 at 8:01 PM, moon soo Lee <[hidden email]> wrote:

> We've got couple of questions in the mailing list about attaching debuggers
> to the interpreter process. And i also personally got questions about
> debugging interpreter process. Definitely, debugging facility can be added
> to 'Interpreter' category.
>
> Thanks for great feedback.
>
> Thanks,
> moon
>
> On Sat, Feb 27, 2016 at 1:15 PM Darren Govoni <[hidden email]> wrote:
>
> > Looks fantastic moon.
> >
> > Anything in the community with regards to easier debugging with specific
> > backends? E.g. spark.
> >
> > Sent from my Verizon Wireless 4G LTE smartphone
> >
> >
> > -------- Original message --------
> > From: moon soo Lee <[hidden email]>
> > Date: 02/27/2016 3:48 PM (GMT-05:00)
> > To: [hidden email],
> [hidden email]
> > Subject: [DISCUSS] Update Roadmap
> >
> > Hi Zeppelin users and developers,
> >
> > The roadmap we have published at
> > https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
> > is almost 9 month old, and it doesn't reflect where the community goes
> > anymore. It's time to update.
> >
> > Based on mailing list, jira issues, pullrequests, feedbacks from users,
> > conferences and meetings, I could summarize the major interest of users
> and
> > developers in 7 categories. Enterprise ready, Usability improvement,
> > Pluggability, Documentation, Backend integration, Notebook storage, and
> > Visualization.
> >
> > And i could list related subjects under each categories.
> >
> >    - Enterprise ready
> >       - Authentication
> >          - Shiro authentication ZEPPELIN-548
> >          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
> >       - Authorization
> >          - Notebook authorization PR-681
> >          <https://github.com/apache/incubator-zeppelin/pull/681>
> >       - Security
> >       - Multi-tenancy
> >       - Stability
> >    - Usability Improvement
> >       - UX improvement
> >       - Better Table data support
> >          - Download data as csv, etc PR-725
> >          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
> >          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
> >          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
> >          <https://github.com/apache/incubator-zeppelin/pull/89>
> >          - Featureful table data display (pagenation, etc)
> >       - Pluggability ZEPPELIN-533
> >    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
> >       - Pluggable visualization
> >       - Dynamic Interpreter, notebook, visualization loading
> >       - Repository and registry for pluggable components
> >    - Improve documentation
> >       - Improve contents and readability
> >       - more tutorials, examples
> >    - Interpreter
> >       - Generic JDBC Interpreter
> >       - (spark)R Interpreter
> >       - Cluster manager for interpreter (Proposal
> >       <
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal
> >
> >       )
> >       - more interpreters
> >    - Notebook storage
> >       - Versioning ZEPPELIN-540
> >       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
> >       - more notebook storages
> >    - Visualization
> >       - More visualizations PR-152
> >       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
> >       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
> >       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
> >       <https://github.com/apache/incubator-zeppelin/pull/321>
> >       - Customize graph (show/hide label, color, etc)
> >
> >
> > It will help anyone quickly get overall interest of project and the
> > direction. And based on this roadmap, we can discuss and re-define the
> next
> > release 0.6.0 scope and it's schedule.
> >
> > What do you think? Any feedback would be appreciated.
> >
> > Thanks,
> > moon
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Update Roadmap

Vinayak Agrawal
In reply to this post by moon
Moon,
The new roadmap looks very promising. I am very happy to see security in
the list.
I have some suggestions regarding Enterprise Ready features:

1. Job Scheduler - Can this be improved?
Currently the scheduler can be used with Cron expression or a pre-set time.
But in an enterprise solution, a notebook might be one piece of the
workflow. Can we look towards the functionality of scheduling notebook's
based on other notebooks finishing their job successfully?
This requirement would arise in any ETL workflow, where all the downstream
users wait for the ETL notebook to finish successfully. Only after that,
other business oriented notebooks can be executed.

2. Importing a notebook - Is there a current requirement or future plan to
implement a feature that allows import-notebook-from-github? This would
allow users to share notebooks seamlessly.

Thanks
Vinayak

On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <[hidden email]> wrote:

> Zhong Wang,
> Right, Folder support would be quite useful. Thanks for the opinion.
> Hope i can finish the work pr-190
> <https://github.com/apache/incubator-zeppelin/pull/190>.
>
> Sourav,
> Regarding concurrent running, Zeppelin doesn't have limitation of run
> paragraph/query concurrently. Interpreter can implement it's own scheduling
> policy. For example, SparkSQL interpreter and ShellInterpreter can already
> run paragraph/query concurrently.
>
> SparkInterpreter is implemented with FIFO scheduler considering nature of
> scala compiler. That's why user can not run multiple paragraph concurrently
> when they work with SparkInterpreter.
> But as Zhong Wang mentioned, pr-703 enables each notebook will have
> separate scala compiler so paragraphs run concurrently, while they're in
> different notebooks.
> Thanks for the feedback!
>
> Best,
> moon
>
> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <[hidden email]>
> wrote:
>
>> Sourav: I think this newly merged PR can help you
>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>
>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>> [hidden email]> wrote:
>>
>>> Hi Moon,
>>>
>>> This looks great.
>>>
>>> My only suggestion would be to include a PR/feature - Support for
>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>
>>> Right now if more than one user tries to run paragraphs in multiple
>>> notebooks concurrently through a single Zeppelin instance (and single
>>> interpreter instance) the performance is very slow. It is obvious that the
>>> queue gets built up within the zeppelin process and interpreter process in
>>> that scenario as the time taken to move the status from start to pending
>>> and pending to running is very high compared to the actual running time of
>>> a paragraph.
>>>
>>> Without this the multi tenancy support would be meaningless as no one
>>> can practically use it in a situation where multiple users are trying to
>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>> possible solution would be to spawn separate instance of the same
>>> interpreter at every notebook/user level.
>>>
>>> Regards,
>>> Sourav
>>>
>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <[hidden email]> wrote:
>>>
>>>> Hi Zeppelin users and developers,
>>>>
>>>> The roadmap we have published at
>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>> is almost 9 month old, and it doesn't reflect where the community goes
>>>> anymore. It's time to update.
>>>>
>>>> Based on mailing list, jira issues, pullrequests, feedbacks from users,
>>>> conferences and meetings, I could summarize the major interest of users and
>>>> developers in 7 categories. Enterprise ready, Usability improvement,
>>>> Pluggability, Documentation, Backend integration, Notebook storage, and
>>>> Visualization.
>>>>
>>>> And i could list related subjects under each categories.
>>>>
>>>>    - Enterprise ready
>>>>       - Authentication
>>>>          - Shiro authentication ZEPPELIN-548
>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>       - Authorization
>>>>          - Notebook authorization PR-681
>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>       - Security
>>>>       - Multi-tenancy
>>>>       - Stability
>>>>    - Usability Improvement
>>>>       - UX improvement
>>>>       - Better Table data support
>>>>          - Download data as csv, etc PR-725
>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>          PR-714 <https://github.com/apache/incubator-zeppelin/pull/714>
>>>>          , PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>,
>>>>          PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>          - Featureful table data display (pagenation, etc)
>>>>       - Pluggability ZEPPELIN-533
>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>       - Pluggable visualization
>>>>       - Dynamic Interpreter, notebook, visualization loading
>>>>       - Repository and registry for pluggable components
>>>>    - Improve documentation
>>>>       - Improve contents and readability
>>>>       - more tutorials, examples
>>>>    - Interpreter
>>>>       - Generic JDBC Interpreter
>>>>       - (spark)R Interpreter
>>>>       - Cluster manager for interpreter (Proposal
>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>       )
>>>>       - more interpreters
>>>>    - Notebook storage
>>>>       - Versioning ZEPPELIN-540
>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>       - more notebook storages
>>>>    - Visualization
>>>>       - More visualizations PR-152
>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>       - Customize graph (show/hide label, color, etc)
>>>>
>>>>
>>>> It will help anyone quickly get overall interest of project and the
>>>> direction. And based on this roadmap, we can discuss and re-define the next
>>>> release 0.6.0 scope and it's schedule.
>>>>
>>>> What do you think? Any feedback would be appreciated.
>>>>
>>>> Thanks,
>>>> moon
>>>>
>>>>
>>>


--
Vinayak Agrawal


"To Strive, To Seek, To Find and Not to Yield!"
~Lord Alfred Tennyson
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Update Roadmap

Eran Witkon
@Vinayak Agrawal I would suggest adding the ability to connect zeppelin to
existing scheduling tools\workflow tools such as  https://oozie.apache.org/.
this requires betters hooks and status reporting but doesn't make zeppeling
and ETL\scheduler tool by itself/


On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <[hidden email]>
wrote:

> Moon,
> The new roadmap looks very promising. I am very happy to see security in
> the list.
> I have some suggestions regarding Enterprise Ready features:
>
> 1. Job Scheduler - Can this be improved?
> Currently the scheduler can be used with Cron expression or a pre-set
> time. But in an enterprise solution, a notebook might be one piece of the
> workflow. Can we look towards the functionality of scheduling notebook's
> based on other notebooks finishing their job successfully?
> This requirement would arise in any ETL workflow, where all the downstream
> users wait for the ETL notebook to finish successfully. Only after that,
> other business oriented notebooks can be executed.
>
> 2. Importing a notebook - Is there a current requirement or future plan to
> implement a feature that allows import-notebook-from-github? This would
> allow users to share notebooks seamlessly.
>
> Thanks
> Vinayak
>
> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <[hidden email]> wrote:
>
>> Zhong Wang,
>> Right, Folder support would be quite useful. Thanks for the opinion.
>>
> Hope i can finish the work pr-190
>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>
>
>> Sourav,
>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>> run paragraph/query concurrently.
>>
>> SparkInterpreter is implemented with FIFO scheduler considering nature of
>> scala compiler. That's why user can not run multiple paragraph concurrently
>> when they work with SparkInterpreter.
>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>> separate scala compiler so paragraphs run concurrently, while they're in
>> different notebooks.
>> Thanks for the feedback!
>>
>> Best,
>> moon
>>
> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <[hidden email]>
>> wrote:
>>
> Sourav: I think this newly merged PR can help you
>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>
>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>> [hidden email]> wrote:
>>>
>> Hi Moon,
>>>>
>>>> This looks great.
>>>>
>>>> My only suggestion would be to include a PR/feature - Support for
>>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>>
>>>> Right now if more than one user tries to run paragraphs in multiple
>>>> notebooks concurrently through a single Zeppelin instance (and single
>>>> interpreter instance) the performance is very slow. It is obvious that the
>>>> queue gets built up within the zeppelin process and interpreter process in
>>>> that scenario as the time taken to move the status from start to pending
>>>> and pending to running is very high compared to the actual running time of
>>>> a paragraph.
>>>>
>>>> Without this the multi tenancy support would be meaningless as no one
>>>> can practically use it in a situation where multiple users are trying to
>>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>>> possible solution would be to spawn separate instance of the same
>>>> interpreter at every notebook/user level.
>>>>
>>>> Regards,
>>>> Sourav
>>>>
>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <[hidden email]> wrote:
>>>>
>>> Hi Zeppelin users and developers,
>>>>>
>>>>> The roadmap we have published at
>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>> is almost 9 month old, and it doesn't reflect where the community goes
>>>>> anymore. It's time to update.
>>>>>
>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>>>>> users, conferences and meetings, I could summarize the major interest of
>>>>> users and developers in 7 categories. Enterprise ready, Usability
>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>> storage, and Visualization.
>>>>>
>>>>> And i could list related subjects under each categories.
>>>>>
>>>>
>>>>>    - Enterprise ready
>>>>>       - Authentication
>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>       - Authorization
>>>>>          - Notebook authorization PR-681
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>       - Security
>>>>>       - Multi-tenancy
>>>>>       - Stability
>>>>>    - Usability Improvement
>>>>>
>>>>>
>>>>>    - UX improvement
>>>>>       - Better Table data support
>>>>>
>>>>>
>>>>>    - Download data as csv, etc PR-725
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>>          PR-714
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>
>>>>>
>>>>>    - Featureful table data display (pagenation, etc)
>>>>>
>>>>>
>>>>>    - Pluggability ZEPPELIN-533
>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>       - Pluggable visualization
>>>>>
>>>>>
>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>
>>>>>
>>>>>    - Repository and registry for pluggable components
>>>>>
>>>>>
>>>>>    - Improve documentation
>>>>>       - Improve contents and readability
>>>>>       - more tutorials, examples
>>>>>    - Interpreter
>>>>>       - Generic JDBC Interpreter
>>>>>       - (spark)R Interpreter
>>>>>       - Cluster manager for interpreter (Proposal
>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>       )
>>>>>       - more interpreters
>>>>>    - Notebook storage
>>>>>       - Versioning ZEPPELIN-540
>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>       - more notebook storages
>>>>>    - Visualization
>>>>>
>>>>>
>>>>>    - More visualizations PR-152
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>
>>>>>
>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>
>>>>> It will help anyone quickly get overall interest of project and the
>>>>> direction. And based on this roadmap, we can discuss and re-define the next
>>>>> release 0.6.0 scope and it's schedule.
>>>>>
>>>>> What do you think? Any feedback would be appreciated.
>>>>>
>>>>> Thanks,
>>>>> moon
>>>>>
>>>>>
>
>
> --
> Vinayak Agrawal
>
>
> "To Strive, To Seek, To Find and Not to Yield!"
> ~Lord Alfred Tennyson
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Update Roadmap

Shabeel Syed
In reply to this post by Vinayak Agrawal
Hi Moon,

       Some of my requirements.

   1. Can we achieve better memory management for notebooks ? I'm also
   facing some similar OOM issue, like Dafeng mentioned in other
   discussion.I'm using the iframe view of a paragraph, can we load that
   code+results to memory only when requested ? I think this is one area to be
   focused on.
   2. In table/graph view can we include below features along with
   pagination ?

                a) Search , similar to
https://docs.angularjs.org/api/ng/filter/filter
                b) Sorting of columns. Also custom sorting algorithms ?

    Also any idea on GA for these suggested improvements ?


Regards
Shabeel

On Mon, Feb 29, 2016 at 1:51 PM, Vinayak Agrawal <[hidden email]
> wrote:

> Moon,
> The new roadmap looks very promising. I am very happy to see security in
> the list.
> I have some suggestions regarding Enterprise Ready features:
>
> 1. Job Scheduler - Can this be improved?
> Currently the scheduler can be used with Cron expression or a pre-set
> time. But in an enterprise solution, a notebook might be one piece of the
> workflow. Can we look towards the functionality of scheduling notebook's
> based on other notebooks finishing their job successfully?
> This requirement would arise in any ETL workflow, where all the downstream
> users wait for the ETL notebook to finish successfully. Only after that,
> other business oriented notebooks can be executed.
>
> 2. Importing a notebook - Is there a current requirement or future plan to
> implement a feature that allows import-notebook-from-github? This would
> allow users to share notebooks seamlessly.
>
> Thanks
> Vinayak
>
> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <[hidden email]> wrote:
>
>> Zhong Wang,
>> Right, Folder support would be quite useful. Thanks for the opinion.
>> Hope i can finish the work pr-190
>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>
>> Sourav,
>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>> run paragraph/query concurrently.
>>
>> SparkInterpreter is implemented with FIFO scheduler considering nature of
>> scala compiler. That's why user can not run multiple paragraph concurrently
>> when they work with SparkInterpreter.
>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>> separate scala compiler so paragraphs run concurrently, while they're in
>> different notebooks.
>> Thanks for the feedback!
>>
>> Best,
>> moon
>>
>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <[hidden email]>
>> wrote:
>>
>>> Sourav: I think this newly merged PR can help you
>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>
>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>> [hidden email]> wrote:
>>>
>>>> Hi Moon,
>>>>
>>>> This looks great.
>>>>
>>>> My only suggestion would be to include a PR/feature - Support for
>>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>>
>>>> Right now if more than one user tries to run paragraphs in multiple
>>>> notebooks concurrently through a single Zeppelin instance (and single
>>>> interpreter instance) the performance is very slow. It is obvious that the
>>>> queue gets built up within the zeppelin process and interpreter process in
>>>> that scenario as the time taken to move the status from start to pending
>>>> and pending to running is very high compared to the actual running time of
>>>> a paragraph.
>>>>
>>>> Without this the multi tenancy support would be meaningless as no one
>>>> can practically use it in a situation where multiple users are trying to
>>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>>> possible solution would be to spawn separate instance of the same
>>>> interpreter at every notebook/user level.
>>>>
>>>> Regards,
>>>> Sourav
>>>>
>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <[hidden email]> wrote:
>>>>
>>>>> Hi Zeppelin users and developers,
>>>>>
>>>>> The roadmap we have published at
>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>> is almost 9 month old, and it doesn't reflect where the community goes
>>>>> anymore. It's time to update.
>>>>>
>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>>>>> users, conferences and meetings, I could summarize the major interest of
>>>>> users and developers in 7 categories. Enterprise ready, Usability
>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>> storage, and Visualization.
>>>>>
>>>>> And i could list related subjects under each categories.
>>>>>
>>>>>    - Enterprise ready
>>>>>       - Authentication
>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>       - Authorization
>>>>>          - Notebook authorization PR-681
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>       - Security
>>>>>       - Multi-tenancy
>>>>>       - Stability
>>>>>    - Usability Improvement
>>>>>       - UX improvement
>>>>>       - Better Table data support
>>>>>          - Download data as csv, etc PR-725
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>>          PR-714
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>          - Featureful table data display (pagenation, etc)
>>>>>       - Pluggability ZEPPELIN-533
>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>       - Pluggable visualization
>>>>>       - Dynamic Interpreter, notebook, visualization loading
>>>>>       - Repository and registry for pluggable components
>>>>>    - Improve documentation
>>>>>       - Improve contents and readability
>>>>>       - more tutorials, examples
>>>>>    - Interpreter
>>>>>       - Generic JDBC Interpreter
>>>>>       - (spark)R Interpreter
>>>>>       - Cluster manager for interpreter (Proposal
>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>       )
>>>>>       - more interpreters
>>>>>    - Notebook storage
>>>>>       - Versioning ZEPPELIN-540
>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>       - more notebook storages
>>>>>    - Visualization
>>>>>       - More visualizations PR-152
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>       - Customize graph (show/hide label, color, etc)
>>>>>
>>>>>
>>>>> It will help anyone quickly get overall interest of project and the
>>>>> direction. And based on this roadmap, we can discuss and re-define the next
>>>>> release 0.6.0 scope and it's schedule.
>>>>>
>>>>> What do you think? Any feedback would be appreciated.
>>>>>
>>>>> Thanks,
>>>>> moon
>>>>>
>>>>>
>>>>
>
>
> --
> Vinayak Agrawal
>
>
> "To Strive, To Seek, To Find and Not to Yield!"
> ~Lord Alfred Tennyson
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Update Roadmap

Benjamin Kim
In reply to this post by Eran Witkon
I concur with this suggestion. In the enterprise, management would like to see scheduled runs to be tracked, monitored, and given SLA constraints for the mission critical. Alerts and notifications are crucial for DevOps to respond with error clarification within it. If the Zeppelin notebooks can be executed by a third party scheduling application, such as Oozie, then this requirement can be satisfied if there are no immediate plans for a built-in one.

> On Feb 29, 2016, at 1:17 AM, Eran Witkon <[hidden email]> wrote:
>
> @Vinayak Agrawal I would suggest adding the ability to connect zeppelin to existing scheduling tools\workflow tools such as  https://oozie.apache.org/ <https://oozie.apache.org/>. this requires betters hooks and status reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>
>
> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <[hidden email] <mailto:[hidden email]>> wrote:
> Moon,
> The new roadmap looks very promising. I am very happy to see security in the list.
> I have some suggestions regarding Enterprise Ready features:
>
> 1. Job Scheduler - Can this be improved?
> Currently the scheduler can be used with Cron expression or a pre-set time. But in an enterprise solution, a notebook might be one piece of the workflow. Can we look towards the functionality of scheduling notebook's based on other notebooks finishing their job successfully?
> This requirement would arise in any ETL workflow, where all the downstream users wait for the ETL notebook to finish successfully. Only after that, other business oriented notebooks can be executed.  
>
> 2. Importing a notebook - Is there a current requirement or future plan to implement a feature that allows import-notebook-from-github? This would allow users to share notebooks seamlessly.
>
> Thanks
> Vinayak
>
> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <[hidden email] <mailto:[hidden email]>> wrote:
> Zhong Wang,
> Right, Folder support would be quite useful. Thanks for the opinion.
> Hope i can finish the work pr-190 <https://github.com/apache/incubator-zeppelin/pull/190>.
>
> Sourav,
> Regarding concurrent running, Zeppelin doesn't have limitation of run paragraph/query concurrently. Interpreter can implement it's own scheduling policy. For example, SparkSQL interpreter and ShellInterpreter can already run paragraph/query concurrently.
>
> SparkInterpreter is implemented with FIFO scheduler considering nature of scala compiler. That's why user can not run multiple paragraph concurrently when they work with SparkInterpreter.
> But as Zhong Wang mentioned, pr-703 enables each notebook will have separate scala compiler so paragraphs run concurrently, while they're in different notebooks.
> Thanks for the feedback!
>
> Best,
> moon
> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <[hidden email] <mailto:[hidden email]>> wrote:
> Sourav: I think this newly merged PR can help you https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537 <https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537>
>
> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <[hidden email] <mailto:[hidden email]>> wrote:
> Hi Moon,
>
> This looks great.
>
> My only suggestion would be to include a PR/feature - Support for Running Concurrent paragraphs/queries in Zeppelin.
>
> Right now if more than one user tries to run paragraphs in multiple notebooks concurrently through a single Zeppelin instance (and single interpreter instance) the performance is very slow. It is obvious that the queue gets built up within the zeppelin process and interpreter process in that scenario as the time taken to move the status from start to pending and pending to running is very high compared to the actual running time of a paragraph.
>
> Without this the multi tenancy support would be meaningless as no one can practically use it in a situation where multiple users are trying to connect to the same instance of Zeppelin (and the related interpreter). A possible solution would be to spawn separate instance of the same interpreter at every notebook/user level.
>
> Regards,
> Sourav
> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <[hidden email] <mailto:[hidden email]>> wrote:
> Hi Zeppelin users and developers,
>
> The roadmap we have published at
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap <https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap>
> is almost 9 month old, and it doesn't reflect where the community goes anymore. It's time to update.
>
> Based on mailing list, jira issues, pullrequests, feedbacks from users, conferences and meetings, I could summarize the major interest of users and developers in 7 categories. Enterprise ready, Usability improvement, Pluggability, Documentation, Backend integration, Notebook storage, and Visualization.
>
> And i could list related subjects under each categories.
> Enterprise ready
> Authentication
> Shiro authentication ZEPPELIN-548 <https://issues.apache.org/jira/browse/ZEPPELIN-548>
> Authorization
> Notebook authorization PR-681 <https://github.com/apache/incubator-zeppelin/pull/681>
> Security
> Multi-tenancy
> Stability
> Usability Improvement
> UX improvement
> Better Table data support
> Download data as csv, etc PR-725 <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714 <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
> Featureful table data display (pagenation, etc)
> Pluggability ZEPPELIN-533 <https://issues.apache.org/jira/browse/ZEPPELIN-533>
> Pluggable visualization
> Dynamic Interpreter, notebook, visualization loading
> Repository and registry for pluggable components
> Improve documentation
> Improve contents and readability
> more tutorials, examples
> Interpreter
> Generic JDBC Interpreter
> (spark)R Interpreter
> Cluster manager for interpreter (Proposal <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>)
> more interpreters
> Notebook storage
> Versioning ZEPPELIN-540 <http://issues.apache.org/jira/browse/ZEPPELIN-540>
> more notebook storages
> Visualization
> More visualizations PR-152 <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728 <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336 <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321 <https://github.com/apache/incubator-zeppelin/pull/321>
> Customize graph (show/hide label, color, etc)
> It will help anyone quickly get overall interest of project and the direction. And based on this roadmap, we can discuss and re-define the next release 0.6.0 scope and it's schedule.
>
> What do you think? Any feedback would be appreciated.
>
> Thanks,
> moon
>
>
>
>
> --
> Vinayak Agrawal
>
>
> "To Strive, To Seek, To Find and Not to Yield!"
> ~Lord Alfred Tennyson

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Update Roadmap

Prasad Wagle
This is a great list.

In the enterprise ready section, what do you think about adding "High
Availability and Disaster Recovery"? We can start with updating the
documentation with best practices and scripts for a cold standby solution
and work towards active-active
<https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_availability_cold_warm_hot?lang=en>
 solution.

Another suggestion is to store meta-data for notes like creator, last
updated (time and user) and number of views. We can show this information
in the top level page in a table format with ability to sort by any column.

On Mon, Feb 29, 2016 at 7:15 AM, Benjamin Kim <[hidden email]> wrote:

> I concur with this suggestion. In the enterprise, management would like to
> see scheduled runs to be tracked, monitored, and given SLA constraints for
> the mission critical. Alerts and notifications are crucial for DevOps to
> respond with error clarification within it. If the Zeppelin notebooks can
> be executed by a third party scheduling application, such as Oozie, then
> this requirement can be satisfied if there are no immediate plans for a
> built-in one.
>
> On Feb 29, 2016, at 1:17 AM, Eran Witkon <[hidden email]> wrote:
>
> @Vinayak Agrawal I would suggest adding the ability to connect zeppelin
> to existing scheduling tools\workflow tools such as
> https://oozie.apache.org/. this requires betters hooks and status
> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>
>
> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
> [hidden email]> wrote:
>
>> Moon,
>> The new roadmap looks very promising. I am very happy to see security in
>> the list.
>> I have some suggestions regarding Enterprise Ready features:
>>
>> 1. Job Scheduler - Can this be improved?
>> Currently the scheduler can be used with Cron expression or a pre-set
>> time. But in an enterprise solution, a notebook might be one piece of the
>> workflow. Can we look towards the functionality of scheduling notebook's
>> based on other notebooks finishing their job successfully?
>> This requirement would arise in any ETL workflow, where all the
>> downstream users wait for the ETL notebook to finish successfully. Only
>> after that, other business oriented notebooks can be executed.
>>
>> 2. Importing a notebook - Is there a current requirement or future plan
>> to implement a feature that allows import-notebook-from-github? This would
>> allow users to share notebooks seamlessly.
>>
>> Thanks
>> Vinayak
>>
>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <[hidden email]> wrote:
>>
>>> Zhong Wang,
>>> Right, Folder support would be quite useful. Thanks for the opinion.
>>>
>> Hope i can finish the work pr-190
>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>
>>
>>> Sourav,
>>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>>> run paragraph/query concurrently.
>>>
>>> SparkInterpreter is implemented with FIFO scheduler considering nature
>>> of scala compiler. That's why user can not run multiple paragraph
>>> concurrently when they work with SparkInterpreter.
>>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>>> separate scala compiler so paragraphs run concurrently, while they're in
>>> different notebooks.
>>> Thanks for the feedback!
>>>
>>> Best,
>>> moon
>>>
>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <[hidden email]>
>>> wrote:
>>>
>> Sourav: I think this newly merged PR can help you
>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>
>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>> [hidden email]> wrote:
>>>>
>>> Hi Moon,
>>>>>
>>>>> This looks great.
>>>>>
>>>>> My only suggestion would be to include a PR/feature - Support for
>>>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>>>
>>>>> Right now if more than one user tries to run paragraphs in multiple
>>>>> notebooks concurrently through a single Zeppelin instance (and single
>>>>> interpreter instance) the performance is very slow. It is obvious that the
>>>>> queue gets built up within the zeppelin process and interpreter process in
>>>>> that scenario as the time taken to move the status from start to pending
>>>>> and pending to running is very high compared to the actual running time of
>>>>> a paragraph.
>>>>>
>>>>> Without this the multi tenancy support would be meaningless as no one
>>>>> can practically use it in a situation where multiple users are trying to
>>>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>>>> possible solution would be to spawn separate instance of the same
>>>>> interpreter at every notebook/user level.
>>>>>
>>>>> Regards,
>>>>> Sourav
>>>>>
>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <[hidden email]> wrote:
>>>>>
>>>> Hi Zeppelin users and developers,
>>>>>>
>>>>>> The roadmap we have published at
>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>> is almost 9 month old, and it doesn't reflect where the community
>>>>>> goes anymore. It's time to update.
>>>>>>
>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>>>>>> users, conferences and meetings, I could summarize the major interest of
>>>>>> users and developers in 7 categories. Enterprise ready, Usability
>>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>>> storage, and Visualization.
>>>>>>
>>>>>> And i could list related subjects under each categories.
>>>>>>
>>>>>
>>>>>>    - Enterprise ready
>>>>>>       - Authentication
>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>       - Authorization
>>>>>>          - Notebook authorization PR-681
>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>       - Security
>>>>>>       - Multi-tenancy
>>>>>>       - Stability
>>>>>>    - Usability Improvement
>>>>>>
>>>>>>
>>>>>>    - UX improvement
>>>>>>       - Better Table data support
>>>>>>
>>>>>>
>>>>>>    - Download data as csv, etc PR-725
>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>>>          PR-714
>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>,
>>>>>>          PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>,
>>>>>>          PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>
>>>>>>
>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>
>>>>>>
>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>       - Pluggable visualization
>>>>>>
>>>>>>
>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>
>>>>>>
>>>>>>    - Repository and registry for pluggable components
>>>>>>
>>>>>>
>>>>>>    - Improve documentation
>>>>>>       - Improve contents and readability
>>>>>>       - more tutorials, examples
>>>>>>    - Interpreter
>>>>>>       - Generic JDBC Interpreter
>>>>>>       - (spark)R Interpreter
>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>       )
>>>>>>       - more interpreters
>>>>>>    - Notebook storage
>>>>>>       - Versioning ZEPPELIN-540
>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>       - more notebook storages
>>>>>>    - Visualization
>>>>>>
>>>>>>
>>>>>>    - More visualizations PR-152
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>
>>>>>>
>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>
>>>>>> It will help anyone quickly get overall interest of project and the
>>>>>> direction. And based on this roadmap, we can discuss and re-define the next
>>>>>> release 0.6.0 scope and it's schedule.
>>>>>>
>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>
>>>>>> Thanks,
>>>>>> moon
>>>>>>
>>>>>>
>>
>>
>> --
>> Vinayak Agrawal
>>
>>
>> "To Strive, To Seek, To Find and Not to Yield!"
>> ~Lord Alfred Tennyson
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Update Roadmap

Amos B. Elberg
A few suggestions for the roadmap:

1. Increase unit test coverage.  I suggest we set thresholds -- say, 70% for
0.6, 85% for 0.7, and aim for 95% before 1.0.

2. Language support.  Right now, interpreters essentially have to be written
in Java, or at least have java wrappers.  This is because the current design
has each interpreter class call a `static class` method when the class is
loaded, to register the Interpreter with zeppelin.  In the long term, using
static class methods will inevitably be a source of architectural problems.  
(People have been saying that the feature should be removed entirely from Java
since 1998.)  In the short term, if we fix this, then it would be easy for
people to write interpreters in other jvm languages, such as Scala, Clojure,
Python (by Jython), Elixir (by whatever the Elxir jvm converter is called),
Groovy, etc.  

3. Remove Spark-under-zeppelin-home.  Many, many, many of our issues,
including many CI issues, trace back to the old system of installing Spark
under Zeppelin-home.  This is essentially a legacy thing from when Zeppelin
was a PR submitted as an add-on to Spark.  Right now, it doesn't buy us
anything -- but it does complicate the build process, create dependency
conflicts, and lead to user support issues.  

I suggest we deprecate this ASAP, and remove it entirely before 0.7, or 0.8 at
the latest.  

4. Drop support for Spark before 1.3, or better yet before 1.4.  Jeff
Steinmetz suggested this the other day.  It would simplify CI and the build
process, as well as maintenance as Spark heads toward 2.0.  I can't imagine
more than a tiny number of people who use zeppelin are using it with Spark
1.2, or even 1.3.

5.  Reform the configuration system.  Right now, Zeppelin configuration is set
in:  
        - ZeppelinConfiguration.java (developers must edit)
        - The xml configuration (administrator must edit)
        - The env configuration file (administrator must edit)
        - Multiple json files such as interpreter.json (edited through the
interface)

The result is kind of a mish-mash, and it creates user support issues when
people enter conflicting configurations or configurations in the wrong place.

It's also a developer issue because we haven't defined what takes precedence
over what.  

I suggest we introduce a part of the architecture which acts as an arbitrator
for all configuraiton issues -- when any class needs to access or change
configuration, it can go through one place.  Then we can figure out how we
want to present configuration to the users.

6.  Disable most interpreters other than Spark-related (and MD) by default.  
At this point, we've proliferated so many interpreters, that it complicates
the build cycle and, well, just isn't necessary.

On Monday, February 29, 2016 8:04:09 AM EDT Prasad Wagle wrote:

> This is a great list.
>
> In the enterprise ready section, what do you think about adding "High
> Availability and Disaster Recovery"? We can start with updating the
> documentation with best practices and scripts for a cold standby solution
> and work towards active-active
> <https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_a
> vailability_cold_warm_hot?lang=en> solution.
>
> Another suggestion is to store meta-data for notes like creator, last
> updated (time and user) and number of views. We can show this information
> in the top level page in a table format with ability to sort by any column.
>
> On Mon, Feb 29, 2016 at 7:15 AM, Benjamin Kim <[hidden email]> wrote:
> > I concur with this suggestion. In the enterprise, management would like to
> > see scheduled runs to be tracked, monitored, and given SLA constraints for
> > the mission critical. Alerts and notifications are crucial for DevOps to
> > respond with error clarification within it. If the Zeppelin notebooks can
> > be executed by a third party scheduling application, such as Oozie, then
> > this requirement can be satisfied if there are no immediate plans for a
> > built-in one.
> >
> > On Feb 29, 2016, at 1:17 AM, Eran Witkon <[hidden email]> wrote:
> >
> > @Vinayak Agrawal I would suggest adding the ability to connect zeppelin
> > to existing scheduling tools\workflow tools such as
> > https://oozie.apache.org/. this requires betters hooks and status
> > reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
> >
> >
> > On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
> >
> > [hidden email]> wrote:
> >> Moon,
> >> The new roadmap looks very promising. I am very happy to see security in
> >> the list.
> >> I have some suggestions regarding Enterprise Ready features:
> >>
> >> 1. Job Scheduler - Can this be improved?
> >> Currently the scheduler can be used with Cron expression or a pre-set
> >> time. But in an enterprise solution, a notebook might be one piece of the
> >> workflow. Can we look towards the functionality of scheduling notebook's
> >> based on other notebooks finishing their job successfully?
> >> This requirement would arise in any ETL workflow, where all the
> >> downstream users wait for the ETL notebook to finish successfully. Only
> >> after that, other business oriented notebooks can be executed.
> >>
> >> 2. Importing a notebook - Is there a current requirement or future plan
> >> to implement a feature that allows import-notebook-from-github? This
> >> would
> >> allow users to share notebooks seamlessly.
> >>
> >> Thanks
> >> Vinayak
> >>
> >> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <[hidden email]> wrote:
> >>> Zhong Wang,
> >>> Right, Folder support would be quite useful. Thanks for the opinion.
> >>
> >> Hope i can finish the work pr-190
> >>
> >>> <https://github.com/apache/incubator-zeppelin/pull/190>.
> >>>
> >>>
> >>> Sourav,
> >>> Regarding concurrent running, Zeppelin doesn't have limitation of run
> >>> paragraph/query concurrently. Interpreter can implement it's own
> >>> scheduling
> >>> policy. For example, SparkSQL interpreter and ShellInterpreter can
> >>> already
> >>> run paragraph/query concurrently.
> >>>
> >>> SparkInterpreter is implemented with FIFO scheduler considering nature
> >>> of scala compiler. That's why user can not run multiple paragraph
> >>> concurrently when they work with SparkInterpreter.
> >>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
> >>> separate scala compiler so paragraphs run concurrently, while they're in
> >>> different notebooks.
> >>> Thanks for the feedback!
> >>>
> >>> Best,
> >>> moon
> >>
> >> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <[hidden email]>
> >>
> >>> wrote:
> >> Sourav: I think this newly merged PR can help you
> >>
> >>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-1855
> >>>> 82537
> >>>>
> >>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
> >>>
> >>>> [hidden email]> wrote:
> >>> Hi Moon,
> >>>
> >>>>> This looks great.
> >>>>>
> >>>>> My only suggestion would be to include a PR/feature - Support for
> >>>>> Running Concurrent paragraphs/queries in Zeppelin.
> >>>>>
> >>>>> Right now if more than one user tries to run paragraphs in multiple
> >>>>> notebooks concurrently through a single Zeppelin instance (and single
> >>>>> interpreter instance) the performance is very slow. It is obvious that
> >>>>> the
> >>>>> queue gets built up within the zeppelin process and interpreter
> >>>>> process in
> >>>>> that scenario as the time taken to move the status from start to
> >>>>> pending
> >>>>> and pending to running is very high compared to the actual running
> >>>>> time of
> >>>>> a paragraph.
> >>>>>
> >>>>> Without this the multi tenancy support would be meaningless as no one
> >>>>> can practically use it in a situation where multiple users are trying
> >>>>> to
> >>>>> connect to the same instance of Zeppelin (and the related
> >>>>> interpreter). A
> >>>>> possible solution would be to spawn separate instance of the same
> >>>>> interpreter at every notebook/user level.
> >>>>>
> >>>>> Regards,
> >>>>> Sourav
> >>>>
> >>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <[hidden email]> wrote:
> >>>>
> >>>> Hi Zeppelin users and developers,
> >>>>
> >>>>>> The roadmap we have published at
> >>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
> >>>>>> is almost 9 month old, and it doesn't reflect where the community
> >>>>>> goes anymore. It's time to update.
> >>>>>>
> >>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
> >>>>>> users, conferences and meetings, I could summarize the major interest
> >>>>>> of
> >>>>>> users and developers in 7 categories. Enterprise ready, Usability
> >>>>>> improvement, Pluggability, Documentation, Backend integration,
> >>>>>> Notebook
> >>>>>> storage, and Visualization.
> >>>>>>
> >>>>>> And i could list related subjects under each categories.
> >>>>>>
> >>>>>>    - Enterprise ready
> >>>>>>    
> >>>>>>       - Authentication
> >>>>>>      
> >>>>>>          - Shiro authentication ZEPPELIN-548
> >>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
> >>>>>>      
> >>>>>>       - Authorization
> >>>>>>      
> >>>>>>          - Notebook authorization PR-681
> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
> >>>>>>      
> >>>>>>       - Security
> >>>>>>       - Multi-tenancy
> >>>>>>       - Stability
> >>>>>>    
> >>>>>>    - Usability Improvement
> >>>>>>    
> >>>>>>    
> >>>>>>    - UX improvement
> >>>>>>    
> >>>>>>       - Better Table data support
> >>>>>>    
> >>>>>>    - Download data as csv, etc PR-725
> >>>>>>    
> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
> >>>>>>          PR-714
> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>,
> >>>>>>          PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>,
> >>>>>>          PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
> >>>>>>    
> >>>>>>    - Featureful table data display (pagenation, etc)
> >>>>>>    
> >>>>>>    
> >>>>>>    - Pluggability ZEPPELIN-533
> >>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
> >>>>>>    
> >>>>>>       - Pluggable visualization
> >>>>>>    
> >>>>>>    - Dynamic Interpreter, notebook, visualization loading
> >>>>>>    
> >>>>>>    
> >>>>>>    - Repository and registry for pluggable components
> >>>>>>    
> >>>>>>    
> >>>>>>    - Improve documentation
> >>>>>>    
> >>>>>>       - Improve contents and readability
> >>>>>>       - more tutorials, examples
> >>>>>>    
> >>>>>>    - Interpreter
> >>>>>>    
> >>>>>>       - Generic JDBC Interpreter
> >>>>>>       - (spark)R Interpreter
> >>>>>>       - Cluster manager for interpreter (Proposal
> >>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+M
> >>>>>>       anager+Proposal> )
> >>>>>>       - more interpreters
> >>>>>>    
> >>>>>>    - Notebook storage
> >>>>>>    
> >>>>>>       - Versioning ZEPPELIN-540
> >>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
> >>>>>>       - more notebook storages
> >>>>>>    
> >>>>>>    - Visualization
> >>>>>>    
> >>>>>>    
> >>>>>>    - More visualizations PR-152
> >>>>>>    
> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
> >>>>>>    
> >>>>>>    - Customize graph (show/hide label, color, etc)
> >>>>>>
> >>>>>> It will help anyone quickly get overall interest of project and the
> >>>>>> direction. And based on this roadmap, we can discuss and re-define
> >>>>>> the next
> >>>>>> release 0.6.0 scope and it's schedule.
> >>>>>>
> >>>>>> What do you think? Any feedback would be appreciated.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> moon
> >>
> >> --
> >> Vinayak Agrawal
> >>
> >>
> >> "To Strive, To Seek, To Find and Not to Yield!"
> >> ~Lord Alfred Tennyson


Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Update Roadmap

Jeff Steinmetz
Comments:

Regarding #2: Language support.  It would be great to see more Scala (once up to speed with Scala I never wanted to look back at Java)
Regarding #3: Drop old SPARK support.  Seems like low hanging fruit, low impact & high reward.
Regarding #5: Configuration Files. We could take a queue from other great open source (Apache license) projects, like ElasticSearch, and migrate to .yml files instead of verbose XML files and leave Environment variables for per-machine settings & global settings related to the java runtime, JVM memory configs and directories paths such as [FOO]_HOME.
An alternative to .yml is HOCON.  The Play Framework and Spark Job Server make use of easy to read HOCON style files, which is a a JSON superset.
https://github.com/typesafehub/config/blob/master/HOCON.md

Typesafe licenses their entire config library under the Apache library, and uses plain Java with no dependencies:
https://github.com/typesafehub/config


Regarding #6: Excluding the more esoteric interpreters by default seems reasonable

Addition:  Create a common installer that also bundles a service manager upstart script for Debian or CentOS (not sure about Windows).  Install via Debian package with a simple `dpkg -i` command.
Addition:  Build tools,  Does anybody have history with Gradle?  Is a Switch from Maven to Gradle worth it - I admit I am not an XML fan and realize this is not a simple task.  Gradle may make it easier to organize the builds if interpreters ever became plugins.  Each plugin could have its own build.gradle file

"Improve documentation” is always a big yes.


Regards,
Jeff Steinmetz








On 4/6/16, 7:32 PM, "Amos Elberg" <[hidden email]> wrote:

>A few suggestions for the roadmap:
>
>1. Increase unit test coverage.  I suggest we set thresholds -- say, 70% for
>0.6, 85% for 0.7, and aim for 95% before 1.0.
>
>2. Language support.  Right now, interpreters essentially have to be written
>in Java, or at least have java wrappers.  This is because the current design
>has each interpreter class call a `static class` method when the class is
>loaded, to register the Interpreter with zeppelin.  In the long term, using
>static class methods will inevitably be a source of architectural problems.  
>(People have been saying that the feature should be removed entirely from Java
>since 1998.)  In the short term, if we fix this, then it would be easy for
>people to write interpreters in other jvm languages, such as Scala, Clojure,
>Python (by Jython), Elixir (by whatever the Elxir jvm converter is called),
>Groovy, etc.  
>
>3. Remove Spark-under-zeppelin-home.  Many, many, many of our issues,
>including many CI issues, trace back to the old system of installing Spark
>under Zeppelin-home.  This is essentially a legacy thing from when Zeppelin
>was a PR submitted as an add-on to Spark.  Right now, it doesn't buy us
>anything -- but it does complicate the build process, create dependency
>conflicts, and lead to user support issues.  
>
>I suggest we deprecate this ASAP, and remove it entirely before 0.7, or 0.8 at
>the latest.  
>
>4. Drop support for Spark before 1.3, or better yet before 1.4.  Jeff
>Steinmetz suggested this the other day.  It would simplify CI and the build
>process, as well as maintenance as Spark heads toward 2.0.  I can't imagine
>more than a tiny number of people who use zeppelin are using it with Spark
>1.2, or even 1.3.
>
>5.  Reform the configuration system.  Right now, Zeppelin configuration is set
>in:  
> - ZeppelinConfiguration.java (developers must edit)
> - The xml configuration (administrator must edit)
> - The env configuration file (administrator must edit)
> - Multiple json files such as interpreter.json (edited through the
>interface)
>
>The result is kind of a mish-mash, and it creates user support issues when
>people enter conflicting configurations or configurations in the wrong place.
>
>It's also a developer issue because we haven't defined what takes precedence
>over what.  
>
>I suggest we introduce a part of the architecture which acts as an arbitrator
>for all configuraiton issues -- when any class needs to access or change
>configuration, it can go through one place.  Then we can figure out how we
>want to present configuration to the users.
>
>6.  Disable most interpreters other than Spark-related (and MD) by default.  
>At this point, we've proliferated so many interpreters, that it complicates
>the build cycle and, well, just isn't necessary.
>
>On Monday, February 29, 2016 8:04:09 AM EDT Prasad Wagle wrote:
>> This is a great list.
>>
>> In the enterprise ready section, what do you think about adding "High
>> Availability and Disaster Recovery"? We can start with updating the
>> documentation with best practices and scripts for a cold standby solution
>> and work towards active-active
>> <https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_a
>> vailability_cold_warm_hot?lang=en> solution.
>>
>> Another suggestion is to store meta-data for notes like creator, last
>> updated (time and user) and number of views. We can show this information
>> in the top level page in a table format with ability to sort by any column.
>>
>> On Mon, Feb 29, 2016 at 7:15 AM, Benjamin Kim <[hidden email]> wrote:
>> > I concur with this suggestion. In the enterprise, management would like to
>> > see scheduled runs to be tracked, monitored, and given SLA constraints for
>> > the mission critical. Alerts and notifications are crucial for DevOps to
>> > respond with error clarification within it. If the Zeppelin notebooks can
>> > be executed by a third party scheduling application, such as Oozie, then
>> > this requirement can be satisfied if there are no immediate plans for a
>> > built-in one.
>> >
>> > On Feb 29, 2016, at 1:17 AM, Eran Witkon <[hidden email]> wrote:
>> >
>> > @Vinayak Agrawal I would suggest adding the ability to connect zeppelin
>> > to existing scheduling tools\workflow tools such as
>> > https://oozie.apache.org/. this requires betters hooks and status
>> > reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>> >
>> >
>> > On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>> >
>> > [hidden email]> wrote:
>> >> Moon,
>> >> The new roadmap looks very promising. I am very happy to see security in
>> >> the list.
>> >> I have some suggestions regarding Enterprise Ready features:
>> >>
>> >> 1. Job Scheduler - Can this be improved?
>> >> Currently the scheduler can be used with Cron expression or a pre-set
>> >> time. But in an enterprise solution, a notebook might be one piece of the
>> >> workflow. Can we look towards the functionality of scheduling notebook's
>> >> based on other notebooks finishing their job successfully?
>> >> This requirement would arise in any ETL workflow, where all the
>> >> downstream users wait for the ETL notebook to finish successfully. Only
>> >> after that, other business oriented notebooks can be executed.
>> >>
>> >> 2. Importing a notebook - Is there a current requirement or future plan
>> >> to implement a feature that allows import-notebook-from-github? This
>> >> would
>> >> allow users to share notebooks seamlessly.
>> >>
>> >> Thanks
>> >> Vinayak
>> >>
>> >> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <[hidden email]> wrote:
>> >>> Zhong Wang,
>> >>> Right, Folder support would be quite useful. Thanks for the opinion.
>> >>
>> >> Hope i can finish the work pr-190
>> >>
>> >>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>> >>>
>> >>>
>> >>> Sourav,
>> >>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>> >>> paragraph/query concurrently. Interpreter can implement it's own
>> >>> scheduling
>> >>> policy. For example, SparkSQL interpreter and ShellInterpreter can
>> >>> already
>> >>> run paragraph/query concurrently.
>> >>>
>> >>> SparkInterpreter is implemented with FIFO scheduler considering nature
>> >>> of scala compiler. That's why user can not run multiple paragraph
>> >>> concurrently when they work with SparkInterpreter.
>> >>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>> >>> separate scala compiler so paragraphs run concurrently, while they're in
>> >>> different notebooks.
>> >>> Thanks for the feedback!
>> >>>
>> >>> Best,
>> >>> moon
>> >>
>> >> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <[hidden email]>
>> >>
>> >>> wrote:
>> >> Sourav: I think this newly merged PR can help you
>> >>
>> >>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-1855
>> >>>> 82537
>> >>>>
>> >>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>> >>>
>> >>>> [hidden email]> wrote:
>> >>> Hi Moon,
>> >>>
>> >>>>> This looks great.
>> >>>>>
>> >>>>> My only suggestion would be to include a PR/feature - Support for
>> >>>>> Running Concurrent paragraphs/queries in Zeppelin.
>> >>>>>
>> >>>>> Right now if more than one user tries to run paragraphs in multiple
>> >>>>> notebooks concurrently through a single Zeppelin instance (and single
>> >>>>> interpreter instance) the performance is very slow. It is obvious that
>> >>>>> the
>> >>>>> queue gets built up within the zeppelin process and interpreter
>> >>>>> process in
>> >>>>> that scenario as the time taken to move the status from start to
>> >>>>> pending
>> >>>>> and pending to running is very high compared to the actual running
>> >>>>> time of
>> >>>>> a paragraph.
>> >>>>>
>> >>>>> Without this the multi tenancy support would be meaningless as no one
>> >>>>> can practically use it in a situation where multiple users are trying
>> >>>>> to
>> >>>>> connect to the same instance of Zeppelin (and the related
>> >>>>> interpreter). A
>> >>>>> possible solution would be to spawn separate instance of the same
>> >>>>> interpreter at every notebook/user level.
>> >>>>>
>> >>>>> Regards,
>> >>>>> Sourav
>> >>>>
>> >>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <[hidden email]> wrote:
>> >>>>
>> >>>> Hi Zeppelin users and developers,
>> >>>>
>> >>>>>> The roadmap we have published at
>> >>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>> >>>>>> is almost 9 month old, and it doesn't reflect where the community
>> >>>>>> goes anymore. It's time to update.
>> >>>>>>
>> >>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>> >>>>>> users, conferences and meetings, I could summarize the major interest
>> >>>>>> of
>> >>>>>> users and developers in 7 categories. Enterprise ready, Usability
>> >>>>>> improvement, Pluggability, Documentation, Backend integration,
>> >>>>>> Notebook
>> >>>>>> storage, and Visualization.
>> >>>>>>
>> >>>>>> And i could list related subjects under each categories.
>> >>>>>>
>> >>>>>>    - Enterprise ready
>> >>>>>>    
>> >>>>>>       - Authentication
>> >>>>>>      
>> >>>>>>          - Shiro authentication ZEPPELIN-548
>> >>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>> >>>>>>      
>> >>>>>>       - Authorization
>> >>>>>>      
>> >>>>>>          - Notebook authorization PR-681
>> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>> >>>>>>      
>> >>>>>>       - Security
>> >>>>>>       - Multi-tenancy
>> >>>>>>       - Stability
>> >>>>>>    
>> >>>>>>    - Usability Improvement
>> >>>>>>    
>> >>>>>>    
>> >>>>>>    - UX improvement
>> >>>>>>    
>> >>>>>>       - Better Table data support
>> >>>>>>    
>> >>>>>>    - Download data as csv, etc PR-725
>> >>>>>>    
>> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>> >>>>>>          PR-714
>> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>,
>> >>>>>>          PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>,
>> >>>>>>          PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
>> >>>>>>    
>> >>>>>>    - Featureful table data display (pagenation, etc)
>> >>>>>>    
>> >>>>>>    
>> >>>>>>    - Pluggability ZEPPELIN-533
>> >>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>> >>>>>>    
>> >>>>>>       - Pluggable visualization
>> >>>>>>    
>> >>>>>>    - Dynamic Interpreter, notebook, visualization loading
>> >>>>>>    
>> >>>>>>    
>> >>>>>>    - Repository and registry for pluggable components
>> >>>>>>    
>> >>>>>>    
>> >>>>>>    - Improve documentation
>> >>>>>>    
>> >>>>>>       - Improve contents and readability
>> >>>>>>       - more tutorials, examples
>> >>>>>>    
>> >>>>>>    - Interpreter
>> >>>>>>    
>> >>>>>>       - Generic JDBC Interpreter
>> >>>>>>       - (spark)R Interpreter
>> >>>>>>       - Cluster manager for interpreter (Proposal
>> >>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+M
>> >>>>>>       anager+Proposal> )
>> >>>>>>       - more interpreters
>> >>>>>>    
>> >>>>>>    - Notebook storage
>> >>>>>>    
>> >>>>>>       - Versioning ZEPPELIN-540
>> >>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>> >>>>>>       - more notebook storages
>> >>>>>>    
>> >>>>>>    - Visualization
>> >>>>>>    
>> >>>>>>    
>> >>>>>>    - More visualizations PR-152
>> >>>>>>    
>> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>> >>>>>>    
>> >>>>>>    - Customize graph (show/hide label, color, etc)
>> >>>>>>
>> >>>>>> It will help anyone quickly get overall interest of project and the
>> >>>>>> direction. And based on this roadmap, we can discuss and re-define
>> >>>>>> the next
>> >>>>>> release 0.6.0 scope and it's schedule.
>> >>>>>>
>> >>>>>> What do you think? Any feedback would be appreciated.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> moon
>> >>
>> >> --
>> >> Vinayak Agrawal
>> >>
>> >>
>> >> "To Strive, To Seek, To Find and Not to Yield!"
>> >> ~Lord Alfred Tennyson
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Update Roadmap

John Omernik
Jeff in regards to #5 as a close follower of Apache Drill, I would highly
recommend HOCON for config. HOCON, at least how it's implemented in
Drill, is a great way to visualize configs, and still gives administrators
the ability to use ENV through variables in the HOCON.  This way we could
allow people to hard code in the HOCON or if they want to use their own env
variables in the Zeppelin-env they can. For me, as a user and
administrator, this is powerful when using something like mesos and
marathon to deploy Zeppelin instances.  Huge +1 for HOCON.

John


On Thursday, April 7, 2016, Jeff Steinmetz <[hidden email]>
wrote:

> Comments:
>
> Regarding #2: Language support.  It would be great to see more Scala (once
> up to speed with Scala I never wanted to look back at Java)
> Regarding #3: Drop old SPARK support.  Seems like low hanging fruit, low
> impact & high reward.
> Regarding #5: Configuration Files. We could take a queue from other great
> open source (Apache license) projects, like ElasticSearch, and migrate to
> .yml files instead of verbose XML files and leave Environment variables for
> per-machine settings & global settings related to the java runtime, JVM
> memory configs and directories paths such as [FOO]_HOME.
> An alternative to .yml is HOCON.  The Play Framework and Spark Job Server
> make use of easy to read HOCON style files, which is a a JSON superset.
> https://github.com/typesafehub/config/blob/master/HOCON.md
>
> Typesafe licenses their entire config library under the Apache library,
> and uses plain Java with no dependencies:
> https://github.com/typesafehub/config
>
>
> Regarding #6: Excluding the more esoteric interpreters by default seems
> reasonable
>
> Addition:  Create a common installer that also bundles a service manager
> upstart script for Debian or CentOS (not sure about Windows).  Install via
> Debian package with a simple `dpkg -i` command.
> Addition:  Build tools,  Does anybody have history with Gradle?  Is a
> Switch from Maven to Gradle worth it - I admit I am not an XML fan and
> realize this is not a simple task.  Gradle may make it easier to organize
> the builds if interpreters ever became plugins.  Each plugin could have its
> own build.gradle file
>
> "Improve documentation” is always a big yes.
>
>
> Regards,
> Jeff Steinmetz
>
>
>
>
>
>
>
>
> On 4/6/16, 7:32 PM, "Amos Elberg" <[hidden email] <javascript:;>>
> wrote:
>
> >A few suggestions for the roadmap:
> >
> >1. Increase unit test coverage.  I suggest we set thresholds -- say, 70%
> for
> >0.6, 85% for 0.7, and aim for 95% before 1.0.
> >
> >2. Language support.  Right now, interpreters essentially have to be
> written
> >in Java, or at least have java wrappers.  This is because the current
> design
> >has each interpreter class call a `static class` method when the class is
> >loaded, to register the Interpreter with zeppelin.  In the long term,
> using
> >static class methods will inevitably be a source of architectural
> problems.
> >(People have been saying that the feature should be removed entirely from
> Java
> >since 1998.)  In the short term, if we fix this, then it would be easy for
> >people to write interpreters in other jvm languages, such as Scala,
> Clojure,
> >Python (by Jython), Elixir (by whatever the Elxir jvm converter is
> called),
> >Groovy, etc.
> >
> >3. Remove Spark-under-zeppelin-home.  Many, many, many of our issues,
> >including many CI issues, trace back to the old system of installing Spark
> >under Zeppelin-home.  This is essentially a legacy thing from when
> Zeppelin
> >was a PR submitted as an add-on to Spark.  Right now, it doesn't buy us
> >anything -- but it does complicate the build process, create dependency
> >conflicts, and lead to user support issues.
> >
> >I suggest we deprecate this ASAP, and remove it entirely before 0.7, or
> 0.8 at
> >the latest.
> >
> >4. Drop support for Spark before 1.3, or better yet before 1.4.  Jeff
> >Steinmetz suggested this the other day.  It would simplify CI and the
> build
> >process, as well as maintenance as Spark heads toward 2.0.  I can't
> imagine
> >more than a tiny number of people who use zeppelin are using it with Spark
> >1.2, or even 1.3.
> >
> >5.  Reform the configuration system.  Right now, Zeppelin configuration
> is set
> >in:
> >       - ZeppelinConfiguration.java (developers must edit)
> >       - The xml configuration (administrator must edit)
> >       - The env configuration file (administrator must edit)
> >       - Multiple json files such as interpreter.json (edited through the
> >interface)
> >
> >The result is kind of a mish-mash, and it creates user support issues when
> >people enter conflicting configurations or configurations in the wrong
> place.
> >
> >It's also a developer issue because we haven't defined what takes
> precedence
> >over what.
> >
> >I suggest we introduce a part of the architecture which acts as an
> arbitrator
> >for all configuraiton issues -- when any class needs to access or change
> >configuration, it can go through one place.  Then we can figure out how we
> >want to present configuration to the users.
> >
> >6.  Disable most interpreters other than Spark-related (and MD) by
> default.
> >At this point, we've proliferated so many interpreters, that it
> complicates
> >the build cycle and, well, just isn't necessary.
> >
> >On Monday, February 29, 2016 8:04:09 AM EDT Prasad Wagle wrote:
> >> This is a great list.
> >>
> >> In the enterprise ready section, what do you think about adding "High
> >> Availability and Disaster Recovery"? We can start with updating the
> >> documentation with best practices and scripts for a cold standby
> solution
> >> and work towards active-active
> >> <
> https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_a
> >> vailability_cold_warm_hot?lang=en> solution.
> >>
> >> Another suggestion is to store meta-data for notes like creator, last
> >> updated (time and user) and number of views. We can show this
> information
> >> in the top level page in a table format with ability to sort by any
> column.
> >>
> >> On Mon, Feb 29, 2016 at 7:15 AM, Benjamin Kim <[hidden email]
> <javascript:;>> wrote:
> >> > I concur with this suggestion. In the enterprise, management would
> like to
> >> > see scheduled runs to be tracked, monitored, and given SLA
> constraints for
> >> > the mission critical. Alerts and notifications are crucial for DevOps
> to
> >> > respond with error clarification within it. If the Zeppelin notebooks
> can
> >> > be executed by a third party scheduling application, such as Oozie,
> then
> >> > this requirement can be satisfied if there are no immediate plans for
> a
> >> > built-in one.
> >> >
> >> > On Feb 29, 2016, at 1:17 AM, Eran Witkon <[hidden email]
> <javascript:;>> wrote:
> >> >
> >> > @Vinayak Agrawal I would suggest adding the ability to connect
> zeppelin
> >> > to existing scheduling tools\workflow tools such as
> >> > https://oozie.apache.org/. this requires betters hooks and status
> >> > reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
> >> >
> >> >
> >> > On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
> >> >
> >> > [hidden email] <javascript:;>> wrote:
> >> >> Moon,
> >> >> The new roadmap looks very promising. I am very happy to see
> security in
> >> >> the list.
> >> >> I have some suggestions regarding Enterprise Ready features:
> >> >>
> >> >> 1. Job Scheduler - Can this be improved?
> >> >> Currently the scheduler can be used with Cron expression or a pre-set
> >> >> time. But in an enterprise solution, a notebook might be one piece
> of the
> >> >> workflow. Can we look towards the functionality of scheduling
> notebook's
> >> >> based on other notebooks finishing their job successfully?
> >> >> This requirement would arise in any ETL workflow, where all the
> >> >> downstream users wait for the ETL notebook to finish successfully.
> Only
> >> >> after that, other business oriented notebooks can be executed.
> >> >>
> >> >> 2. Importing a notebook - Is there a current requirement or future
> plan
> >> >> to implement a feature that allows import-notebook-from-github? This
> >> >> would
> >> >> allow users to share notebooks seamlessly.
> >> >>
> >> >> Thanks
> >> >> Vinayak
> >> >>
> >> >> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <[hidden email]
> <javascript:;>> wrote:
> >> >>> Zhong Wang,
> >> >>> Right, Folder support would be quite useful. Thanks for the opinion.
> >> >>
> >> >> Hope i can finish the work pr-190
> >> >>
> >> >>> <https://github.com/apache/incubator-zeppelin/pull/190>.
> >> >>>
> >> >>>
> >> >>> Sourav,
> >> >>> Regarding concurrent running, Zeppelin doesn't have limitation of
> run
> >> >>> paragraph/query concurrently. Interpreter can implement it's own
> >> >>> scheduling
> >> >>> policy. For example, SparkSQL interpreter and ShellInterpreter can
> >> >>> already
> >> >>> run paragraph/query concurrently.
> >> >>>
> >> >>> SparkInterpreter is implemented with FIFO scheduler considering
> nature
> >> >>> of scala compiler. That's why user can not run multiple paragraph
> >> >>> concurrently when they work with SparkInterpreter.
> >> >>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
> >> >>> separate scala compiler so paragraphs run concurrently, while
> they're in
> >> >>> different notebooks.
> >> >>> Thanks for the feedback!
> >> >>>
> >> >>> Best,
> >> >>> moon
> >> >>
> >> >> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <[hidden email]
> <javascript:;>>
> >> >>
> >> >>> wrote:
> >> >> Sourav: I think this newly merged PR can help you
> >> >>
> >> >>>>
> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-1855
> >> >>>> 82537
> >> >>>>
> >> >>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
> >> >>>
> >> >>>> [hidden email] <javascript:;>> wrote:
> >> >>> Hi Moon,
> >> >>>
> >> >>>>> This looks great.
> >> >>>>>
> >> >>>>> My only suggestion would be to include a PR/feature - Support for
> >> >>>>> Running Concurrent paragraphs/queries in Zeppelin.
> >> >>>>>
> >> >>>>> Right now if more than one user tries to run paragraphs in
> multiple
> >> >>>>> notebooks concurrently through a single Zeppelin instance (and
> single
> >> >>>>> interpreter instance) the performance is very slow. It is obvious
> that
> >> >>>>> the
> >> >>>>> queue gets built up within the zeppelin process and interpreter
> >> >>>>> process in
> >> >>>>> that scenario as the time taken to move the status from start to
> >> >>>>> pending
> >> >>>>> and pending to running is very high compared to the actual running
> >> >>>>> time of
> >> >>>>> a paragraph.
> >> >>>>>
> >> >>>>> Without this the multi tenancy support would be meaningless as no
> one
> >> >>>>> can practically use it in a situation where multiple users are
> trying
> >> >>>>> to
> >> >>>>> connect to the same instance of Zeppelin (and the related
> >> >>>>> interpreter). A
> >> >>>>> possible solution would be to spawn separate instance of the same
> >> >>>>> interpreter at every notebook/user level.
> >> >>>>>
> >> >>>>> Regards,
> >> >>>>> Sourav
> >> >>>>
> >> >>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <[hidden email]
> <javascript:;>> wrote:
> >> >>>>
> >> >>>> Hi Zeppelin users and developers,
> >> >>>>
> >> >>>>>> The roadmap we have published at
> >> >>>>>>
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
> >> >>>>>> is almost 9 month old, and it doesn't reflect where the community
> >> >>>>>> goes anymore. It's time to update.
> >> >>>>>>
> >> >>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
> >> >>>>>> users, conferences and meetings, I could summarize the major
> interest
> >> >>>>>> of
> >> >>>>>> users and developers in 7 categories. Enterprise ready, Usability
> >> >>>>>> improvement, Pluggability, Documentation, Backend integration,
> >> >>>>>> Notebook
> >> >>>>>> storage, and Visualization.
> >> >>>>>>
> >> >>>>>> And i could list related subjects under each categories.
> >> >>>>>>
> >> >>>>>>    - Enterprise ready
> >> >>>>>>
> >> >>>>>>       - Authentication
> >> >>>>>>
> >> >>>>>>          - Shiro authentication ZEPPELIN-548
> >> >>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
> >> >>>>>>
> >> >>>>>>       - Authorization
> >> >>>>>>
> >> >>>>>>          - Notebook authorization PR-681
> >> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
> >> >>>>>>
> >> >>>>>>       - Security
> >> >>>>>>       - Multi-tenancy
> >> >>>>>>       - Stability
> >> >>>>>>
> >> >>>>>>    - Usability Improvement
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>    - UX improvement
> >> >>>>>>
> >> >>>>>>       - Better Table data support
> >> >>>>>>
> >> >>>>>>    - Download data as csv, etc PR-725
> >> >>>>>>
> >> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725
> >,
> >> >>>>>>          PR-714
> >> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714
> >,
> >> >>>>>>          PR-6 <
> https://github.com/apache/incubator-zeppelin/pull/6>,
> >> >>>>>>          PR-89 <
> https://github.com/apache/incubator-zeppelin/pull/89>
> >> >>>>>>
> >> >>>>>>    - Featureful table data display (pagenation, etc)
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>    - Pluggability ZEPPELIN-533
> >> >>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
> >> >>>>>>
> >> >>>>>>       - Pluggable visualization
> >> >>>>>>
> >> >>>>>>    - Dynamic Interpreter, notebook, visualization loading
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>    - Repository and registry for pluggable components
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>    - Improve documentation
> >> >>>>>>
> >> >>>>>>       - Improve contents and readability
> >> >>>>>>       - more tutorials, examples
> >> >>>>>>
> >> >>>>>>    - Interpreter
> >> >>>>>>
> >> >>>>>>       - Generic JDBC Interpreter
> >> >>>>>>       - (spark)R Interpreter
> >> >>>>>>       - Cluster manager for interpreter (Proposal
> >> >>>>>>       <
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+M
> >> >>>>>>       anager+Proposal> )
> >> >>>>>>       - more interpreters
> >> >>>>>>
> >> >>>>>>    - Notebook storage
> >> >>>>>>
> >> >>>>>>       - Versioning ZEPPELIN-540
> >> >>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
> >> >>>>>>       - more notebook storages
> >> >>>>>>
> >> >>>>>>    - Visualization
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>    - More visualizations PR-152
> >> >>>>>>
> >> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>,
> PR-728
> >> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>,
> PR-336
> >> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>,
> PR-321
> >> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
> >> >>>>>>
> >> >>>>>>    - Customize graph (show/hide label, color, etc)
> >> >>>>>>
> >> >>>>>> It will help anyone quickly get overall interest of project and
> the
> >> >>>>>> direction. And based on this roadmap, we can discuss and
> re-define
> >> >>>>>> the next
> >> >>>>>> release 0.6.0 scope and it's schedule.
> >> >>>>>>
> >> >>>>>> What do you think? Any feedback would be appreciated.
> >> >>>>>>
> >> >>>>>> Thanks,
> >> >>>>>> moon
> >> >>
> >> >> --
> >> >> Vinayak Agrawal
> >> >>
> >> >>
> >> >> "To Strive, To Seek, To Find and Not to Yield!"
> >> >> ~Lord Alfred Tennyson
> >
> >
>
>

--
Sent from my iThing