[GitHub] zeppelin pull request #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to supp...

classic Classic list List threaded Threaded
43 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin pull request #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to supp...

zjffdu
GitHub user jongyoul opened a pull request:

    https://github.com/apache/zeppelin/pull/2329

    [WIP][PoC] ZEPPELIN-2040 ClusterManager to support launching interpreter in a cluster

    ### What is this PR for?
    Launching interpreters into yarn cluster. This is PoC level now and has many more steps described below. The main classes are `Client`, `ClusterManager`, `RemoteInterpreterYarnProcess`
   
    ### What type of PR is it?
    [Bug Fix | Improvement | Feature | Documentation | Hot Fix | Refactoring]
   
    ### Todos
    * [ ] - Divide yarn dependencies with other module
    * [ ] - Support yarn-cluster without setting SPARK_HOME
    * [ ] - Remove unused files
    * [ ] - TBD
   
    ### What is the Jira issue?
    * https://issues.apache.org/jira/browse/ZEPPELI-2040
   
    ### How should this be tested?
    1. Install hadoop in your local https://dtflaneur.wordpress.com/2015/10/02/installing-hadoop-on-mac-osx-el-capitan/
    1. add `zeppelin.cluster_manager=yarn` in your spark interpreter setting
    1. test spark script
   
    ### Screenshots (if appropriate)
   
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? No


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jongyoul/zeppelin ZEPPELIN-2040

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/2329.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2329
   
----
commit 4498cbe0bc46aa7f3e66ea3c1c5a38c750d1577c
Author: Jongyoul Lee <[hidden email]>
Date:   2017-04-15T15:39:37Z

    First step to implement from scratch

commit db753c4cd2253911c70935beb2d2e9bd46710e74
Author: Jongyoul Lee <[hidden email]>
Date:   2017-04-23T21:47:31Z

    Added license header for avoiding rat failed

commit 96b610760406521e82c8cbeeec2aea977324b5a4
Author: Jongyoul Lee <[hidden email]>
Date:   2017-04-24T09:14:51Z

    WIP

commit 0c3f898f53b7a4931993289019d29702c54bd696
Author: Jongyoul Lee <[hidden email]>
Date:   2017-04-24T14:55:08Z

    WIP

commit 060fb2742a32e8c2b211ad75f921bbdaf8322a3e
Author: Jongyoul Lee <[hidden email]>
Date:   2017-04-30T04:34:12Z

    WIP

commit 6117bef9a64bf6abf6aa2d61241d6bc4a66c7c01
Author: Jongyoul Lee <[hidden email]>
Date:   2017-05-01T08:57:40Z

    remove zeppelin-cluster/yarn

commit 91a42b83005ec7bd6fde8b8ea6ac6c37a5ae021b
Author: Jongyoul Lee <[hidden email]>
Date:   2017-05-01T23:56:43Z

    WIP

commit 0fbf81568581f57c5cb4b7e416931328f2d79e5e
Author: Jongyoul Lee <[hidden email]>
Date:   2017-05-03T16:10:52Z

    POC in my local

commit 2bb895a136e5527e90725df563433ab60cb2b142
Author: Jongyoul Lee <[hidden email]>
Date:   2017-05-07T15:49:37Z

    Fixed rat issue

commit c31d19d9f2e5fda2a6152ae8910b13274d77d9ce
Author: Jongyoul Lee <[hidden email]>
Date:   2017-05-07T16:23:33Z

    Added license header

commit 8b6b1b85edd9ccaa8dd5f27559c590cd734b15c1
Author: Jongyoul Lee <[hidden email]>
Date:   2017-05-10T00:37:48Z

    First version of yarn cluster manager

commit 09a8fe8a9b4d579fb7d0c56e77a00142f577021c
Author: Jongyoul Lee <[hidden email]>
Date:   2017-05-10T03:01:18Z

    Fixed style
    Fixed rat issue

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to support lau...

zjffdu
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    @jongyoul Thanks for the POC. One question about the cluster mode for spark interpreter. It looks like you are creating a general yarn app support for all the interpreters. But for spark, it has already supported yarn-cluster, we could deploy spark interpreter as yarn-cluster mode via spark-submit. Although your approach may work, but I am afraid it may lose some features compared to the native yarn support of spark and add extra overhead of maintenance of feature parity between native yarn-cluster of spark and zeppelin yarn-cluster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to support lau...

zjffdu
In reply to this post by zjffdu
Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    @zjffdu Thanks for quick review. I know spark already have yarn-cluster by itself. In case of Zeppelin, SparkInterpreter will run yarn-client mode which means driver is still in a process zeppelin launches. at the same time, SparkInterpreter will run in a yarn cluster as one of application masters and will launch Spark's application master as a new application master in yarn cluster. It's fully same as spark-summit does except one thing to spark driver is also in yarn cluster. If you test it in a cluster, you will see like this:
   
    ![pasted image at 2017_05_04 01_07](https://cloud.githubusercontent.com/assets/3612566/25930293/72d7c2f2-3640-11e7-9647-87985173e5b5.png)
   
    It's launched as yarn-client mode. The application type of `ZEPPELIN INTERPRETER` is a driver and one of `SPARK` is a application master spark launches.
   
    BTW, I searched how livy has been supporting yarn-cluster mode and it's adoptable for Zeppelin, too. I'll add that feature sooner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to support lau...

zjffdu
In reply to this post by zjffdu
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    Do you mean there would be 2 yarn apps for launching one spark interpreter ? That looks a little weird to me. In this approach, we still launch remote interpreter process in the zeppelin host, that doesn't solve much memory issue of zeppelin host. And besides it would waste yarn resources as it require to launch 2 yarn apps. Why not leverage spark-submit to support yarn-cluster mode ? And I look at all the interpreters of zeppelin. Most of them are just client role. That means the computation happens in the backend not in the interpreter side. so I think it would be fine to launch these interpreters in shared/scoped mode for all users. Spark Interpreter would be a special case, we could use its native yarn-cluster support. That would be the simplest way I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to support lau...

zjffdu
In reply to this post by zjffdu
Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    In case of Spark, it has two yarn apps. I agree that it's a bit weird for users. Otherwise, all other interpreters like python get better to launch it in yarn cluster. I think it would be better to support yarn-cluster mode in case of Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to support lau...

zjffdu
In reply to this post by zjffdu
Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    It makes our logic much difficult, but it's improve users' experiences.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to support lau...

zjffdu
In reply to this post by zjffdu
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    > it's improve users' experiences
    Sorry, I don't understand how it improve users' experience compared the native spark yarn-cluster support. And does this approach mean all the interpreter must use the same ClusterManager ? If so, this also doesn't make sense to me. Because we may want to run md in local mode and spark in yarn-cluster mode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to support lau...

zjffdu
In reply to this post by zjffdu
Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    I mean 'not in case of spark'. And this implementation satisfies your needs. You can launches some interpreters like md in local mode and some interpreters in yarn.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to support lau...

zjffdu
In reply to this post by zjffdu
Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    And what I meant ux is about supporting spark's native yarn-cluster mode in Zeppelin :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to support lau...

zjffdu
In reply to this post by zjffdu
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    @jongyoul I could not run your PR, seems they only work in your enviroment. There are several issue, one of them is  
    * HADOOP_CONF_DIR is not in zeppelin server classpath, so it could not connect with my yarn cluster. It works in your enviroment because you use default yarn configuration I believe.
     
    Besides that I am still have concern about this approach:
    1. You launch Remote Interpreter Process as yarn AM, and then create Spark Interpreter through thrift in yarn-cluster mode. Seems you don't use spark-submit, that means SparkSubmit.scala is never called, and this would cause potential problems.
    2. The other thing I worry about is that now you launch spark app in a remote machine (Yarn AM). But that remote machine may not have the spark configuration (like spark-defualts.conf, hive-site.xml). And even it seems not possible to run multiple versions of spark in this approach.
   
     


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to support lau...

zjffdu
In reply to this post by zjffdu
Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    @zjffdu I'm adding to read HADOOP_CONF_DIR. Concerning spark, I recommend, for now, to set local[*], and yarn-client. In case of yarn-cluster, I think native approach would be better. I'll push my new changes soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to support lau...

zjffdu
In reply to this post by zjffdu
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    @jongyoul Thanks for quick response. I am thinking that we could do the yarn cluster mode support in 2 steps:
    1. Support yarn-cluster mode for spark interpreter via native spark-submit.
    2. Support yarn-cluster mode for other interpreters via a general framework.
   
    Since spark interpreter is the most important interpreter of zeppelin. We can do step 1 first, what do you think ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to support lau...

zjffdu
In reply to this post by zjffdu
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    @jongyoul If you consider to use the native yarn cluster support of spark, then you may need to check ZEPPELIN-1263 (#1446), as for yarn cluster mode, we have to set configuration before launching the spark app otherwise some configuration would not take effect.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to support lau...

zjffdu
In reply to this post by zjffdu
Github user Tagar commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    Great job. Exciting to see this new feature in Zeppelin.
    My two cents:
   
    > through thrift in yarn-cluster mode
   
    Cloudera's Spark don't have thrift service and Cloudera doesn't recommend to use that (they quote security is one of the concerns there).
    So I also think spark-submit would be a better option (either it's on a remote or local machine, yarn-cluster or yarn-client).
   
    > But that remote machine may not have the spark configuration (like spark-defualts.conf, hive-site.xml).
   
    Not sure how big this of a concern. We normally have all hadoop servers and gateway servers identical all hadoop-related configuration files (it's managed by Cloudera Manager for us).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to support lau...

zjffdu
In reply to this post by zjffdu
Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    @zjffdu Personally, I agree on your opinion. Before I start to do it, I discussed a lot how to support yarn-cluster mode including spark. I concluded it's not much weird to launch spark with yarn-client mode in yarn-cluster. Actually, I expected this kind of reaction while I'm doing this job. Then I'll finish it asap and do that with next step.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to support lau...

zjffdu
In reply to this post by zjffdu
Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    @Tagar Thanks for reaching it out. In case of Spark, I feel like we need to focus on external spark mode more. In case of `spark-defaults.conf`, AFAIK, because we don't need to deploy spark in all cluster to use it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to support lau...

zjffdu
In reply to this post by zjffdu
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    > Cloudera's Spark don't have thrift service and Cloudera doesn't recommend to use that (they quote security is one of the concerns there).
   
    @Tagar Sorry for making you confused. The thrift service is not spark thrift server, it is the thrift protocol between zeppelin server and zeppelin interpreter process.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][PoC] ZEPPELIN-2040 ClusterManager to support lau...

zjffdu
In reply to this post by zjffdu
Github user Tagar commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    @zjffdu @jongyoul got it - thanks for prompt response.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][ZEPPELIN-2040] ClusterManager to support launchi...

zjffdu
In reply to this post by zjffdu
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    @jongyoul What is your current design and plan ? Since this is a big change, it might be better to reach consensus on the design, otherwise it would waste your time if we found issue after you complete the implementation. Overall, I have 2 concerns and suggestions:
    1. use spark-submit to launch spark's InterpreterProcess instead of using yarn api to do that. Because using yarn api means to reinvent the wheel of spark's yarn module, and it would take lots of time to keep the behavior consistent with spark yarn module as there's many tricky things and configurations
    2. How to specify the port when launching the InterpreterProcess in yarn cluster mode. I mentioned it in ZEPPELIN-2035, as I think we need to finish ZEPPELIN-2035 first.
   



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] zeppelin issue #2329: [WIP][ZEPPELIN-2040] ClusterManager to support launchi...

zjffdu
In reply to this post by zjffdu
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2329
 
    @jongyoul I understand this is for all interpreters. What I concern is that this approach would not work for spark. e.g. many of spark configuration would not work (e.g. spark.files, spark.jars, keytab, principal and etc).  And since spark interpreter is the most important interpreter of zeppelin, so I would suggest to implement cluster mode for spark first via spark-submit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
123