by Incrust Software
Share
Introduction :
Language is at the heart of culture, and culture is the glue of society, without language, culture could not be transmitted from one generation to the next. Language is a means of communicating thoughts, ideas, and concepts. Through this medium, ideas are conveyed from one person to another, from one place to another, and from the past to the present and recorded for the future. In today’s world, there are multiple languages for communication. Arabic is one of them, it is a key language in gulf countries.
It should be emphatically noted that language is independent of thought, and that thinking precedes language and thinking produces language and continues to expand the depth and breadth of language. Arabs have always prided themselves on their language. They always use Arabic as a primary source of communication. Most of the physical or digital presence of data is in Arabic form. This Arabic data needs to be stored and shown in Arabic format as it is, on different digital platforms.
Show Arabic data on Internet/website:
Most of the dynamic websites are giving the facility of Arabic text on their website. To move on Arabic data on the website there are some basic changes in the encoding of data on the Internet/website. Usually, Unicode is all that is needed (UTF-8).for Arabic encoding on the web. The website is designed in different HTML tags.to show Arabic as Html text. For HTML output add this to your HEAD section :
<meta http-equiv=”Content-type” content=”text/html;charset=UTF-8″ />
Show Arabic data on Internet/website:
To show data on the Internet/website the data should be stored somewhere in the database(eg sql,mysql,Oracle etc). We take here an example of MySQL, you have to create your database with UTF-8 encoding or at least give your table or specific columns UTF-8 encoding. Then data is stored in Arabic format in your database.
Parsing Arabic text CSV using PHP :
There are simple steps to parse an Arabic text file:
-Open the CVS file in any editor
-re-save it with the CVS extension
-upload it by a script
-Verify CSV to ignore empty rows.
-store data in a database table
-Read all imported data from the MySQL database
Note: You need to convert your CSV file into UTF-8 or process your string using utf8_encode. Make sure your MySQL table is also in UTF-8
While importing the Arabic CSV file we need to simply encode data into UTF-8 and decode the UTF-8 data while saving the data in the database. In this above process, the CSV file must have encoding type UTF-8.otherwise it gives an error while converting data to UTF-8(the error is also handled in the challenges section)or convert/change your CSV to UTF-8(manually).
utf8_encode in PHP:
The utf8_encode() function encodes an ISO-8859-1 string to UTF-8.
Unicode is a universal standard and has been developed to describe all possible characters of all languages plus a lot of symbols with one unique number for each character/symbol.
However, it is not always possible to transfer a Unicode character to another computer reliably. UTF-8 has been developed to transfer a Unicode character from one computer to another.
utf8_decode:
The utf8_decode() function decodes a UTF-8 string to ISO-8859-1.
This function decodes a string, previously encoded with the utf8_encode() function, back to ISO-8859-1.
The readUserRecords() function is doing this CSV parsing.
function readUserRecords()
{
$fileName = $_FILES[“file”][“tmp_name”];
if ($_FILES[“file”][“size”] > 0) {
$file = fopen($fileName, “r”);
$importCount = 0;
while (($column = fgetcsv($file, 10000, “,”)) !== FALSE) {
if (! empty($column) && is_array($column)) {
if ($this->hasEmptyRow($column)) {
continue;
}
if (isset($column[1], $column[3], $column[4])) {
$userName = $column[1];
$password = $column[2];
$firstName = utf8_encode(trim($column[3]));// utf8_encode used for Arabic text
$lastName = utf8_encode(trim($column[4]));// utf8_encode used for Arabic text
$insertId = $this->insertUser($userName, $password, $firstName, $lastName);
if (! empty($insertId)) {
$output[“type”] = “success”;
$output[“message”] = “Import completed.”;
$importCount ++;
}
}
} else {
$output[“type”] = “error”;
$output[“message”] = “Problem in importing data.”;
}
}
if ($importCount == 0) {
$output[“type”] = “error”;
$output[“message”] = “Duplicate data found.”;
}
return $output;
}
}
The above PHP script is for reading the CSV Arabic text file, which is encoded in UTF-8 format. Before saving Arabic data into to database need to decode the Arabic data as shown in below.
function insertUser($userName, $password, $firstName, $lastName)
{
$firstName=utf8_decode($firstName);
$lastName=utf8_decode($lastName);
$hashedPassword = password_hash($password, PASSWORD_DEFAULT);
$sql = “INSERT into users (userName,password,firstName,lastName) values (?,?,?,?)”;
$paramType = “ssss”;
$paramArray = array(
$userName,
$hashedPassword,
$firstName,
$lastName
);
$insertId = $this->conn->insert($sql, $paramType, $paramArray);
}
Challenges in parsing Arabic CSV files:
While parsing Arabic CSV there are some challenges that come, if the CSV file format is not in the UTF-8 encoding, we to convert that CSV file to UTF-8 encoding.
Example: most Linux-based CSV files are in UTF-16 encoding format then we have to convert it into UTF-8 encoded CSV files. For this conversion, we use the mb_convert_encoding function.
Syntax: mb_convert_encoding(array|string $string, string $to_encoding, array|string|null $from_encoding = null): array|string|false.
$data = mb_convert_encoding($data, ‘UTF-8’, ‘UTF-16’);
mb_convert_encoding: Convert a string from one character encoding to another. Converts string from from_encoding, or the current internal encoding, to_encoding. If the string is an array, all its string values will be converted recursively.
Example of critical CSV having Arabic data :
public function import_csv(){
$myfile_name=$mid.”_”.$_FILES[‘file’][‘name’]; // $mid is random id
//file name usually a CSV file name
$target_file1 = $target_dir . $myfile_name;
// $target_file1 contains CSV file
$extg = strtolower(substr($_FILES[‘file’][‘name’], strrpos($_FILES[‘file’][‘name’], ‘.’) + 1));
$basefile_name= basename($_FILES[‘file’][‘name’],’.’.$extg);
$myfile_name2=$mid.”_”.$basefile_name.”.txt”;
//base file name contains a CSV file name without an extension
$target_file2 = $target_dir . $myfile_name2;
// $target_file2 text file is created with the same name
move_uploaded_file($_FILES[‘file’][‘tmp_name’], $target_file1);
//temporary csv file is moved to $target_file1
$data = file_get_contents($target_file1);
//$data contains raw data of $target_file1
$arabic=””;
$data = mb_convert_encoding($data, ‘UTF-8’, ‘UTF-16’);
//convert raw data to UTF-8 format
if($this->is_arabic($data)==1){
//check the data is in Arabic format
$arabic=”arabic”;
file_put_contents($target_file2, $data);
//if Arabic copy the data in to $target_file2 which is our main file
}else{
$arabic=”english”;
}
$input = fopen( $target_file1, “r”);
//open the $target_file1 csv file in read mode
$cdt=0; //simple counter
$myarrow=array(); //Initialize array
$file_handle = fopen($target_file2, ‘a+’);
//open the $target_file2 text file in append mode
while(!feof($input)) {
//loop for $target_file1, we are reading this file line by line
$line=fgets($input);
//returns a line from an open file
if($arabic!=”arabic”){
//if data is not Arabic
fwrite($file_handle,$line);
//write english content to text file $target_file2 which is append mode
}
$Rowlist = explode (“,”,$line);
if($cdt==0){
$Rowlist2=$Rowlist;
//here we get the header list of CSV file
}
$cdt++;
}
fclose($input);
fclose($file_handle);
//close both files
$file_handle = fopen( $target_file2, “r”);
//open the $target_file2 text file in read mode
while(!feof($file_handle)) {
//loop for $target_file2, we are reading this text file line by line
$line_read=str_replace(“\0″,””, fgets($file_handle));
// \0 character with blank and also returns the line from an open file
$Rowlist3 = explode (“,”,$line_read);
//make an array from the line
array_push($myarrow,$Rowlist3);
//get final array $myarrow without header
}
}
public function is_arabic( $string ) {
$rtl_chars_pattern = ‘/[\x{0590}-\x{05ff}\x{0600}-\x{06ff}]/u’;
return preg_match($rtl_chars_pattern, $string);
}
STAY IN THE LOOP
Subscribe to our free newsletter.
What is the Bet Stop and How does Bet Stop – the National Self-Exclusion Register™ work? Bet Stop – the National Self-Exclusion Register™ is a safe and free Australian Government initiative to block yourself from all licensed Australian online and phone gambling providers in a single process. You can register at any time and you […]
Websites are always prone to security risks. This may impact businesses. Strengthen your web portal’s security. Read through our article for some problems and actionable solutions to ensure your web portal is secured OpenSSL 1.1 is considered outdated and vulnerable Issue a new SSL certificate (CA) and update it in your apache SSL configuration file. […]