Author: yodi

The DatastoreGrpcStub requires a local gRPC installation, which is not found

Post author By yodi
Post date July 5, 2024
No Comments on The DatastoreGrpcStub requires a local gRPC installation, which is not found

When running app engine in Google using `dev_appserver.py app.yaml` and I got error

/usr/lib/google-cloud-sdk/platform/google_appengine/google/protobuf/internal/api_implementation.py:100: UserWarning: Selected implementation upb is not available. Falling back to the python implementation.
  warnings.warn('Selected implementation upb is not available. '
INFO     2024-07-05 15:09:52,906 <string>:234] Using Cloud Datastore Emulator.
We are gradually rolling out the emulator as the default datastore implementation of dev_appserver.
If broken, you can temporarily disable it by --support_datastore_emulator=False
Read the documentation: https://cloud.google.com/appengine/docs/standard/python/tools/migrate-cloud-datastore-emulator
Help us validate that the feature is ready by taking this survey: https://goo.gl/forms/UArIcs8K9CUSCm733
Report issues at: https://issuetracker.google.com/issues/new?component=187272

INFO     2024-07-05 15:09:52,909 <string>:316] Skipping SDK update check.
WARNING  2024-07-05 15:09:52,910 <string>:325] The default encoding of your local Python interpreter is set to 'utf-8' while App Engine's production environment uses 'ascii'; as a result your code may behave differently when deployed.
INFO     2024-07-05 15:09:52,961 datastore_emulator.py:152] Starting Cloud Datastore emulator at: http://localhost:36569
Traceback (most recent call last):
  File "/usr/lib/google-cloud-sdk/platform/google_appengine/dev_appserver.py", line 103, in <module>
    _run_file(__file__, globals())
  File "/usr/lib/google-cloud-sdk/platform/google_appengine/dev_appserver.py", line 99, in _run_file
    _execfile(_PATHS.script_file(script_name), globals_)
  File "/usr/lib/google-cloud-sdk/platform/google_appengine/dev_appserver.py", line 81, in _execfile
    exec(open(fn).read(), scope)
  File "<string>", line 638, in <module>
  File "<string>", line 626, in main
  File "<string>", line 393, in start
  File "<string>", line 746, in create_api_server
  File "/usr/lib/google-cloud-sdk/platform/google_appengine/google/appengine/tools/devappserver2/stub_util.py", line 166, in setup_stubs
    datastore_grpc_stub_class(os.environ['DATASTORE_EMULATOR_HOST']))
  File "/usr/lib/google-cloud-sdk/platform/google_appengine/google/appengine/tools/devappserver2/datastore_grpc_stub.py", line 83, in __init__
    raise RuntimeError('The DatastoreGrpcStub requires a local gRPC '
RuntimeError: The DatastoreGrpcStub requires a local gRPC installation, which is not found.
INFO     2024-07-05 15:09:55,335 datastore_emulator.py:158] Cloud Datastore emulator responded after 2.374175 seconds
Exception ignored in: <function DatastoreEmulator.__del__ at 0x79fb5032ecb0>
Traceback (most recent call last):
  File "/usr/lib/google-cloud-sdk/platform/google_appengine/google/appengine/tools/devappserver2/cloud_emulators/datastore/datastore_emulator.py", line 207, in __del__
AttributeError: 'NoneType' object has no attribute 'warning'

To solve this problem, simply passing argument to ignore datastore by

dev_appserver.py app.yaml --support_datastore_emulator=False

Tags The DatastoreGrpcStub requires a local gRPC

Devops

Solve environment is externally managed Pip install

Post author By yodi
Post date July 4, 2024
No Comments on Solve environment is externally managed Pip install

When install python packages using pip in Google Cloud VM, I got error

error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
    python3-xyz, where xyz is the package you are trying to
    install.
    
    If you wish to install a non-Debian-packaged Python package,
    create a virtual environment using python3 -m venv path/to/venv.
    Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
    sure you have python3-full installed.
    
    If you wish to install a non-Debian packaged Python application,
    it may be easiest to use pipx install xyz, which will manage a
    virtual environment for you. Make sure you have pipx installed.
    
    See /usr/share/doc/python3.11/README.venv for more information.

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.

To solve this problem, simply as

sudo rm /usr/lib/python3.11/EXTERNALLY-MANAGED

Tags Solve environment is externally managed Pip install

Spark

Fix problem Cannot run program “null/bin/spark-submit”: error=2, No such file or directory at org.apache.zeppelin.interpreter

Post author By yodi
Post date June 7, 2024
No Comments on Fix problem Cannot run program “null/bin/spark-submit”: error=2, No such file or directory at org.apache.zeppelin.interpreter

When running a new Zeppelin, I got error like :

org.apache.zeppelin.interpreter.InterpreterException: java.io.IOException: Fail to detect scala version, the reason is:Cannot run program "null/bin/spark-submit": error=2, No such file or directory at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:128) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:270) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:428) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:68) at org.apache.zeppelin.scheduler.Job.run(Job.java:186) at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:135) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.io.IOException: Fail to detect scala version, the reason is:Cannot run program "null/bin/spark-submit": error=2, No such file or directory at org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncher.buildEnvFromProperties(SparkInterpreterLauncher.java:139) at org.apache.zeppelin.interpreter.launcher.StandardInterpreterLauncher.launchDirectly(StandardInterpreterLauncher.java:76) at org.apache.zeppelin.interpreter.launcher.InterpreterLauncher.launch(InterpreterLauncher.java:106) at org.apache.zeppelin.interpreter.InterpreterSetting.createInterpreterProcess(InterpreterSetting.java:856) at org.apache.zeppelin.interpreter.ManagedInterpreterGroup.getOrCreateInterpreterProcess(ManagedInterpreterGroup.java:66) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getOrCreateInterpreterProcess(RemoteInterpreter.java:103) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:153) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:125)

To solve this. First, delete the zeppelin/conf/interpreter.json . Dont worry, it will generated after starting zeppelin-daemon.sh

Next, running the zeppelin, go to interpreter and configure SPARK_HOME path on both spark and spark-submit settings. Don’t forget to setup the python path as well. Hope this solve your problem!

Tags Fix problem Cannot run program "null/bin/spark-submit"

Machine Learning

Zeppelin and Spark installation on Ubuntu 24.04

Post author By yodi
Post date April 28, 2024
No Comments on Zeppelin and Spark installation on Ubuntu 24.04

To install the latest Zeppelin (I recommend 0.11.0, since 0.11.1 have bugs) and Spark 3.5.1, we need to do several steps

Required packages

sudo apt-get install openjdk-11-jdk build-essential

Or you can install using previous java version if encountered with error

sudo apt-get install openjdk-8-jdk-headless

2. Spark installation

wget -c https://dlcdn.apache.org/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz

sudo tar -xvvf spark-3.5.1-bin-hadoop3.tgz -C /opt/

sudo mv /opt/spark-3.5.1-bin-hadoop3 /opt/spark

3. Zeppelin installation

wget -c https://downloads.apache.org/zeppelin/zeppelin-0.11.0/zeppelin-0.11.0-bin-all.tgz

sudo tar -xvvf zeppelin-0.11.0-bin-all.tgz -C /opt/

sudo mv /opt/zeppelin-0.11.0-bin-all /opt/zeppelin

4. Install MambaForge

wget -c https://github.com/conda-forge/miniforge/releases/download/24.1.2-0/Mambaforge-24.1.2-0-Linux-x86_64.sh

bash Mambaforge-24.1.2-0-Linux-x86_64.sh -b -p ~/anaconda3

~/anaconda3/bin/mamba init

source ~/.bashrc

mamba create -n test python=3.10

5. Run and Configure Zeppelin

sudo ./opt/zeppelin/bin/zeppelin-daemon.sh start

Then go to Interpreter settings (on top right menu) and search for Spark. Make adjustment

SPARK_HOME = /opt/spark
PYTHON = /home/dev/anaconda3/envs/test/bin/python
PYTHON_DRIVER = /home/dev/anaconda3/envs/test/bin/python

Now you done and enjoy using Zeppelin in your ubuntu!

Tags Zeppelin and Spark installation on Ubuntu 24.04

Machine Learning

Fix Zeppelin Spark-interpreter-0.11.1.jar and Scala

Post author By yodi
Post date April 24, 2024
1 Comment on Fix Zeppelin Spark-interpreter-0.11.1.jar and Scala

When installing Zeppelin 0.11.1 and Spark 3.3.3 over Docker in my github ( https://github.com/yodiaditya/docker-rapids-spark-zeppelin ) I receive error

Caused by: org.apache.zeppelin.interpreter.InterpreterException: Fail to open SparkInterpreter
	at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:140)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
	... 12 more
Caused by: scala.reflect.internal.FatalError: Error accessing /opt/zeppelin/interpreter/spark/._spark-interpreter-0.11.1.jar
	at scala.tools.nsc.classpath.AggregateClassPath.$anonfun$list$3(AggregateClassPath.scala:113)

Apparently, the solution for this is very simple.

All you need just delete the file that cause the problem. In my case

rm /opt/zeppelin/interpreter/spark/._spark-interpreter-0.11.1.jar

rm /opt/zeppelin/interpreter/spark/scala-2.12/._spark-scala-2.12-0.11.1.jar

Machine Learning

Solve Pandas Error: PerformanceWarning: DataFrame is highly fragmented.

Post author By yodi
Post date April 19, 2024
No Comments on Solve Pandas Error: PerformanceWarning: DataFrame is highly fragmented.

When running toPandas() or another operation, I received this error

usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series

The quick solution

spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
# spark.conf.set("spark.sql.execution.arrow.enabled", "true")

Tags Solve Pandas Error: PerformanceWarning: DataFrame is highly fragmented.

Ubuntu

Solve /sbin/ldconfig.real: /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8 is not a symbolic link

Post author By yodi
Post date April 11, 2024
No Comments on Solve /sbin/ldconfig.real: /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8 is not a symbolic link

I received this error when installing packages in Ubuntu 23.10. To solve this issue, you can fix the CUDNN installation steps

Check the ~/.bashrc

export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.8/lib64/

2. Copy CUDNN the right way

If you can’t found folder ‘lib64’ inside CUDNN, just rename ‘lib’ into ‘lib64’

sudo cp -av include/cudnn*.h /usr/local/cuda/include
sudo cp -av lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

This should fixed the not a symbolic link problems!

Tags Solve /sbin/ldconfig.real: /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8 is not a symbolic link

Anaconda

Install Miniforge /Mamba to replace Anaconda in Ubuntu

Post author By yodi
Post date April 9, 2024
No Comments on Install Miniforge /Mamba to replace Anaconda in Ubuntu

Moving to Miniforge / Mamba will help to doing packages installation faster. The first step is to uninstall Anaconda from your machine

Reverse any Anaconda scripts

conda activate
conda init --reverse --all

Remove Anaconda folders

rm -rf ~/anaconda3
rm -rf ~/.conda
rm -rf ~/.condarc

The Miniforge / Mamba Installation

wget -c https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh

Load the environment

source ~/.bashrc
conda install conda-libmamba-solver

Now you are good!

Tags Install Miniforge /Mamba to replace Anaconda in Ubuntu

Networking

Solve alpine APKINDEX.tar.gz no such file temporary error

Post author By yodi
Post date March 20, 2024
No Comments on Solve alpine APKINDEX.tar.gz no such file temporary error

I’m running DockerFile installation and when come to the part of installation APKIndex.tar.gz,


# Add additional repo's for apk to use
RUN echo http://dl-cdn.alpinelinux.org/alpine/v3.3/main > /etc/apk/repositories; \
    echo http://dl-cdn.alpinelinux.org/alpine/v3.3/community >> /etc/apk/repositories

I got error

RUN apk --update add wget tar bash coreutils procps openssl:                                                          
0.503 fetch http://dl-cdn.alpinelinux.org/alpine/v3.3/main/x86_64/APKINDEX.tar.gz                                                      
5.507 ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.3/main: temporary error (try again later)                                         
5.507 WARNING: Ignoring APKINDEX.5a59b88b.tar.gz: No such file or directory                                                            
5.507 fetch http://dl-cdn.alpinelinux.org/alpine/v3.3/community/x86_64/APKINDEX.tar.gz                                                 
10.51 ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.3/community: temporary error (try again later)
10.51 WARNING: Ignoring APKINDEX.7c1f02d6.tar.gz: No such file or directory

Tags Solve alpine APKINDEX.tar.gz no such file temporary error

Ubuntu

Fix DuckDB out of memory error Export Database Parquet

Post author By yodi
Post date March 11, 2024
No Comments on Fix DuckDB out of memory error Export Database Parquet

I’m using the latest DuckDB 0.10.0 and receive memory error when exporting database with Parquet Format. I did with CSV and its work fine.

Memory configuration also set like :

SET memory_limit = '50GB';
SET max_memory = '50GB';
PRAGMA memory_limit=50GB;

This still trigger OOM. The only solution that works is

SET preserve_insertion_order = false;

Hope this help you in solving memory error using DuckDB.

Tags Fix DuckDB out of memory error Export Database Parquet